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CRYSTALLOGRAPHY METHODS 
TECHNICAL FIELD 

The present invention relates to protein crystallography methods and 
constructs useful therein, in particular to fusion proteins comprising a first 
protein and a second protein, whereby the first protein upon crystallization 
yields crystals having available space in the lattice, so as to allow for the 
ordered packing of the second protein into the said available space. The 
invention also relates to methods of crystallization of such a second protein, 
which e.g. can be a protein, such as a membrane prote*_ Mch is otherwise 
difficult to crystallize. The invention further relates to l.a. * ;~ombinant 
vectors adapted for the expression of a fusion protein as described abov . 

BACKGROUND ART 

Membrane proteins are involved in a multitude of biological processes; the 
respiratory chain, photosynthesis, transport of solute molecules and ions and 
regulating cellular responses to a wide range of biological molecules such 
as hormones, neurotransmitters and drugs. High resolution structural data 
has allowed useful insight into the function of a number of integral 
membrane proteins including the G-protein coupled receptor (GPCR), 
rhodopsin (Palczewski et al, 2000), the bci complex (Iwata et al 1998), 
bacteriorhodopsin (Pebay-Peyroula et al, 1997) and cytochrome c oxidase 
(Iwata et al, 1995, Tsukihara et al, 1996). Despite these successes the 
number of resolved membrane protein structures remains extremely small 
compared with soluble proteins. 

Crystallization is necessary to obtain the three-dimensional structure of 
proteins; it often represents the bottleneck in structure determination. In 
particular, crystallization of membrane proteins has been difficult (For 
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reviews, see e.g. Durbin & Feher (1996) Annual Review of Physical 
Chemistry 47, 171-204; and Garavito et al. (1996) Journal of Bioenergetics 
& Biomembranes 28, 13-27). To date, of approximately 12,000 protein 
structures deposited in the Brookhaven Protein Database, only 20 are 

5 membrane proteins. Furthermore, of these 20, only two are eukaryotic in 
origin (Iwata et al. (1998) Science 281, 64-71; and Tsukihara et al. (1996) 
Science 272, 1 136-1144). These are tiny numbers in light of the fact that it 
is estimated that 35-40% of the genes within the human genome code for 
integral membrane or membrane associated proteins. Low levels of 

10 endogenous expression, high hydrophobicity and low stability in solution all 
combine to frustrate the membrane protein crystallographer. In many cases, 
even if enough pure protein can be obtained, it is impossible to grow 
crystals suitable for structural analysis if indeed they grow at all. In 
addition, many proteins fail to produce in E. co/z'-based expression systems, 

15 which have a number of advantages over other expression systems, 
including low cost and rapid production of high cell densities. 

Progress in solving some of these problems has been made using fusion 
protein technology. Fusion of soluble proteins, such as maltose binding 

20 peptide, to the hydrophobic neurotensin receptor, a GPCR, has been shown 
to increase expression and facilitate purification (Tucker & Grisshammer, 
1996). In addition, the crystallisation of cytochrome c oxidase was 
facilitated by the addition of a monoclonal F v fragment (Ostermeier et al, 
1995). Alternatively, crystallisation in lipidic cubic phase has yielded a 

25 high-resolution structure of bacteriorhodopsin (Landau & Rosenbusch, 
1996; Pebay-Peyroula et al, 1997). However, widely applicable methods to 
facilitate the crystallisation of membrane proteins have yet to be described. 



30 



Membrane protein crystals often contain a very high solvent content (65- 
80%; Abramson and Iwata (1999), in Protein Oystallisation, Ed. Terese 



WO 01/85962 



S2T/GB0 1/02043 



3 

Bergfors, International University Line pp 199-210). This solvent space is 
filled mainly with detergent micelles and can form very large gaps within 
the crystal lattice structure, gaps which we have found are large enough to 
accommodate other proteins. 

5 

Carter et al (1994), in Protein and Peptide Lett. 1:175 used a fusion 
between a six amino acid fragment of a HIV polypeptide and glutathione S- 
transferase (GST) to allow crystallisation and x-ray analysis of the HIV 
fragment. In this case, the HIV fragment formed an extension to the GST, 
10 forming an integral part of the GST structure. 

In Prive (1994) Acta Cryst. D50:375-379 and Prive and Kaback (1996) J. 
Bioenergetics Biomembranes 28:29-34, the cytochrome b 5 ei was cloned 
within one of the extracellular loops of the LacP protein. This only acted to 
15 extend the hydrophilic domains of the highly hydrophobic LacP protein, 
and the fusion was not suitable for x-ray analysis since it was incapable of 
producing suitable crystals. 

A similar approach was used by Iwata et al (1995) Nature 376:660 where 
20 an antibody fragment was used to allow crystallisation of cytochrome c 
oxidase. 

However, the GST, cytochrome b 562 "carrier" molecules and antibody 
fragments used previously only provide an extra soluble domain, and none 
25 are suitable, for example, by being relatively large enough, to accommodate 
a second protein or protein fragment within their crystal lattice structure or 
the available space of the crystal. 

The cytochrome bo3 ubiquinol oxidase from E. coli is a member of the 
30 heme-copper superfamily of proton-pumping respiratory oxidases (for a 
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review of the heme-copper respiratory oxidases, see Garcia-Horsman et al. 
(1994) Bacteriol. 176, 5587-5600). Cytochrome bo3 is a four-subunit 
respiratory enzyme (Fig. 1) that catalyses the four-electron reduction of 0 2 
to water and functions as a proton pump (Puustinen et al. (1991) 
Biochemistry 30, 3936-). 

The genes for the cytochrome bo3 subunits are organized within a single 
operon called the cyo operon (Chepuri et al. (1990) J. Biol. Chem. 265, 
11185-11192), which is under control of a constitutive, multicistronic 
promoter (Minacgawa et al. (1990) J. Biol. Chem. 265, 1 1 198-1 1203). The 
sequence of the cyo operon and the amino acid sequences of the subunits 
are disclosed in Chepuri et al. (supra) and in GenBank with accession 
number J05492 (SEQ ID NO: 13). The cyo operon has been shown to 
encode five open reading, frames, cyoABCDE (cf. SEQ ID NOS: 1 and 13). 
The gene products of cyoa, cyob, cvoc and cyod correspond to the 
cytochrome bo3 subunits II, I, III and IV, respectively. The cyoE gene 
encodes a protoheme IX farnesyltransferase (Saiki et al. (1993) Biochein. 
Biophys. Res. Comm. 189, 1491-1497). 

Purification of histidine-tagged cytochrome bo3 from E. coli, using Ni" 
afinity chromatography, is disclosed by Rumbley et al. (1997) Biochimica 
etBiophysica Acta 1340, 131-142). 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 

Structure of cytochrome bo3. The upper drawing, shows a periplasmic 
5 view of the protein, showing the position of subunit IV relative to the other 
transmembrane spanning helices. The heme group is shown in the upper 
right hand corner. The lower drawing shows the same protein looking 
through the membrane. 

10 Figure 2 

Two-dimensional model of subunit IV of cytochrome bo3, showing the 
three transmembrane spanning domains, intracellular N-terminus and 
extracellular C-terminus 

15 Figure 3 

Wire model of cytochrome bo3. Subunit IV is shown in the lower left hand 
comer with the three transmembrane spanning helices labeled. The space 
available within the crystal lattice (>100 kDa) is shown adjacent. 

20 Figure 4 

The cytochrome bo3 fusion vectors. Expression of the separate subunits is 
under control of a single multi-cistronic promoter. A multiple cloning site 
was introduced at the 3 T end of subunit IV and this was used to clone in 
linkers, to act as bridge sequences, and the proteins of interest. 

25 

Figure 5 

Western blot analysis of membranes prepared from GO 105 cells expressing 
pMB908 (UBO only), pMB930, pMB946 and pMB947. Panel A shows a 
blot probed with anti-his antibody directed against the His9 tag at the C- 
30 terminal end of subunit II. This blot clearly shows the expression of all of 
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the modified UBO vector constructs (subunit II is 33 kDa) and that this 
expression can be localized to the membrane. Panel B shows a blot probed 
with anti-HA antibody directed against the HA tag in the linker in pMB947. 
A clear band is seen corresponding to the 17 kDa subunit which is not 
5 present for the untagged control. 

Figure 6 

Crystals of native cytochrome 603, cytochrome 603 + protein Z and 
cytochrome bo3 ■ + apo A-I. The crystals have different forms, with the 
10 native cytochrome 603 forming rod like crystals (diffract to 3.5 A) while the 
cytochrome 603 + protein Z crystals (diffract to 6 A) form as square-plates. 
Crystals of cytochrome 603 + apo A-I (diffract to 5 A) form elongated 
hexagonal plates. 

15 Figure 7 

Expression of cytochrome bo3 + GPCR fusion proteins. This blot was 
probed with anti-HA antibody directed against the HA tag introduced at the 
C-terminal end of subunit TV of cytochrome bo3 as a linker sequence. Lane 
1 shows undetectable signal for cells only. Lane 2 shows the positive 
20 control of bo3 + linker only, which gives a band of 14 kDa corresponding to 
subunit IV only. Lanes 3 and 4 show the expression of subunit IV + either 
Ml or CB2 receptor. Faint signals can be seen corresponding to the full- 
length fusion in both cases. 

25 Figure 8 

Expression of cytochrome bo3 + leader peptide/ProW constructs. The blot 
has been probed with an anti-His antibody and shows the specific (33 kDa) 
bands corresponding to the His-tagged subunit II of cytochrome bo3 for the 
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control, native bo3, and the two fusion proteins. No band is present for 
cells only. 

Figure 9 

5 Expression of cytochrome bo3 + Apo AI constructs. Each sample was 
grown to an OD600 of 1 .0 prior to harvest. Blot A was probed with Apo 
Al-specific antibody and shows both whole cells and membranes for the 
cytochrome bo3 + Apo AI constructs. Lane 1 shows the + ve control of a 
fragment of pure human Apo AI and lane 2 the -ve control of the native 
10 cytochrome bo3 construct (pMB908) with no detectable signal. 
Cytochrome bo3 + Apo AI (pMB1241) yields two specific bands, a full- 
length product (30 kDa) and a breakdown product (18 kDa). For 
cytochrome bo3 + Strep-tag + Apo AI (pMB1242) only one full-length 
band is detected, while no specific bands are detected for cytochrome bo3 + 
15 Strep-HA-tag, + Apo AI (pMB1243). Cytochrome bo3 + protein Z + Apo 
AI (pMB1244) exhibits the highest level of expression although this fusion 
protein also undergoes a certain amount of breakdown, yielding a full- 
length product (50 kDa) and a smaller fragment (14 kDa). The same pattern 
of expression is seen for both whole cells and membranes demonstrating the 
20 localization of the fusion protein to the membrane. Blot B (lanes 1-8) was 
probed with anti-His antibody specific for the His9 tag at the C-terminal end 
of subunit IL No signal is detected for the -ve control of cells only. The 
+ve control of cytochrome bo3 shows a distinct band corresponding to 
subunit II (33 kDa). All the other constructs yield similar bands except 
25 pMB1243 (not shown on this blot) and interestingly all seem to exhibit 
higher expression levels than the control sample. Lane 9 of blot B shows 
pMB1244 probed with peroxidase anti-peroxidase specific to protein Z. 
Interestingly, it is only possible to detect the breakdown product using this 
conjugate, no full length protein can be observed. 

30 
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Figure 10 

Effects of arabinose concentration on the expression of cytochrome bo3 + 
pBAD. The blot has been probed with anti-His antibody specific for the 
His-tag, at the C-terminal end of subunit II. The cells containing the 

5 cytochrome bo3 ■+ pBAD were grown to the mid-log phase and then 
induced with increasing concentrations of arabinose as detailed on the gel. 
Cells were harvested 4 h post-induction. No signals are detected for either 
GL101 cells only or for the uninduced control while a strong band is 
detected for the +ve control, cytochrome bol under control of the native 

10 constitutive promoter. The production of cytochrome bo3 under control of 
the inducible promoter showed a clear dose response relationship with 
increasing concentrations of arabinose yielding increasing levels of 
detectable protein up to 0.2% arabinose. The amount of protein expressed 
at 0.2% arabinose was much higher than that under control of the native 

15 constitutive promoter. However, arabinose concentrations higher than 0.2% 
had no further increase on the expression level (results not shown). The 
results are the same for whole cells and membranes demonstrating the 
localization of the protein to the membrane. 

20 Figure 11 

Effects of induction time on the expression of cytochrome bo3 + pBAD 
promoter. The blot is probed with anti-His antibody specific to the His-tag 
at the C-terminal end of subunit II. A distinct band is seen for the control 
of native bo3 while no signal is detected for the uninduced cytochrome bo3 

25 + pBAD. After the cells had reached the mid-log phase protein production 
was induced with 0.2% arabinose and the cells incubated at +37°C for the 
times indicated on the gel prior to sampling. A strong signal is detected one 
hour post-induction with a slight increase thereafter to 3 h after induction. 
After this time there is a steady drop in the detectable expression of the 

30 tagged protein, to about a 50% reduction after the cells had been incubated 
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overnight (18 h). It is unclear whether this loss in detectable protein is a 
result of degradation of the whole cytochrome bo3 molecule or specific 
proteolysis of the His-tag. 

5 Figure 12 

Crystal packing for cytochrome bo?, + apo A-L Cytochrome bo^ is shown in 
white/pale grey and apo A-I in grey. 

10 DISCLOSURE OF THE INVENTION 

In a first aspect, this invention provides a recombinant vector comprising (i) 
a promoter sequence and (ii) a nucleotide sequence encoding a first protein 
which is a membrane protein or multisubunit protein and which, when 
15 crystallized with a second protein, is capable of accommodating the second 
protein in the crystal lattice; said recombinant vector further allowing for 
the insertion of a further nucleotide sequence encoding a second protein to 
be located, when crystallized, in the crystal lattice of the first protein 
wherein the resulting crystal lattice is capable of diffracting x-rays. 

20 

By "crystal lattice" we mean the crystal lattice produced by the first protein. 

It will be appreciated however, that the crystal lattice is formed by the 
protein which makes and maintains most of the crystal contacts within the 
25 lattice, and that the crystal lattice itself may be altered by the presence of a 
second protein. Such an altered crystal lattice is included in our definition 
of "crystal lattice". 
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A second aspect of the invention provides a recombinant vector comprising 
(i) a promoter sequence and (ii) a nucleotide sequence encoding a first 
protein which is a membrane protein or multisubunit protein and which 
upon crystallization yields crystals having available space in the lattice, so 

5 as to allow for the ordered packing of a second protein into the said 
available space; said recombinant vector further allowing for the insertion 
of a further nucleotide sequence encoding a second protein to be 
accommodated, upon its crystallization, in the said available space in the 
lattice of the first protein wherein the resulting crystal lattice is capable of 

io diffracting x-rays. 

The said "space" may be utilized to force a second "target" protein to pack 
in an ordered manner into the crystal lattice of the first protein, which is 
used as a "scaffold" molecule. In addition, fusing a second protein to a first 
15 protein may facilitate the expression, folding stability in E. coli of the said 
second protein. The recombinant vectors according to the invention thus 
provide a template for facilitated/improved expression, purification, 
crystallization and subsequent structure determination of proteins. 

20 In a preferred embodiment of the first and second aspects, the crystal 
produced is one capable of diffracting X-rays or is one that is useful for X- 
ray analysis. As is explained below, the degree or resolution to which a 
crystal diffracts may be determined by the crystallization conditions. 
Preferably the crystal lattice produced by the first and second proteins is 

25 capable of diffracting x-rays to a resolution of at least 6A, 5 A, more 
preferably at least 4A, 3. 5 A, 3. 25 A or 3 A. Still more preferably, the crystal 
lattice can diffract x-rays to a resolution of more than 2.7 5 A or 2. 5 A. 

The second protein can be expressed together with the first protein as a 
30 fusion protein, or alternatively the second protein can be expressed 
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separately and positioned in the available space of the first protein by means 
of non-covalent interactions. The specificity and affinity necessary for this 
binding may be achieved by fusing protein tags or domains having such 
affinity for each other to the first and second protein, respectively. When 

5 the expression of a fusion protein is desired, the recombinant vector of the 
invention is adapted to allow for the insertion of at least one further 
nucleotide sequence, in particular a sequence encoding the second protein to 
be accommodated, upon its crystallization, in the said available space in the 
lattice of the first protein. It will be appreciated that the location of the 

10 nucleotide coding sequence encoding the second protein may be in one or 
more positions relative to the sequence encoding the first protein. In other 
words, the sequences encoding the first and second proteins may be in any 
order which provides for the second protein to be accommodated, upon its 
crystallisation, in the crystal lattice of the first protein. Hence, the 

15 sequences may be consecutive in any order, either contiguous or separated 
by a further sequence, or they may be non-consecutive, for example, the 
sequence of one protein may be inserted into the coding sequence of the 
other protein. Where the sequence of one protein is inserted into the coding 
sequence of the other protein, it is preferred if this is done such that the 

20 reading frame of the "other protein" is not changed. 

The recombinant vector according to the invention comprises a promoter 
sequence operably linked to the structural gene(s) and is capable of 
mediating the expression of the said first protein or the said fusion protein. 
25 The term "operably linked" as used herein means that the promoter is 
functionally linked to a structural gene in the proper position to express the 
structural gene under control of the promoter. 



30 



The skilled person will be able to determine which proteins are suitable for 
use as the said "first protein" based on the crystal lattice structure of said 
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protein, the size of available cavities in this structure and the positions of 
sites available for attaching fusion partners. In determining whether a 
protein is suitable for use as the said "first protein", it will be appreciated 
that the size of available cavities or space in the crystal structure of the first 

5 protein when crystallized in the absence of a second protein may not be 
strictly limiting in practice. The first protein may assist in crystallization of 
the second protein even if the crystal space generated by the first protein 
and which is available to accommodate the second protein is not identical in 
the presence and absence of the second protein. This flexibility is sufficient 

io so as to allow the first protein to modify its space group when crystallizing. 
Thus, the space group of the crystals of the first protein alone may differ 
from that of crystals of the first and second proteins when together. 

Hence, it will be appreciated that the crystal lattice of the first protein useful 
15 in the present invention, when crystallised in the absence of a second 
protein, may have spaces or gaps which are solvent filled and which are 
smaller than the size of the second protein to be crystallised. Preferably the 
gaps or spaces in the crystal lattice of the first protein are at least the same 
size as the second protein. 

20 

The first protein may be any protein which is suitable for accommodating 
the second protein in its crystal lattice. Hence, the first protein may be a 
soluble protein, including a soluble multisubunit protein, or it may be a 
membrane protein including a membrane-associated protein or an integral 

25 membrane protein. Where the first protein is a multisubunit protein it may 
have any number of subunits, including 2, 3, 4, 5 or 6 or more. Preferably, 
where the first protein is a multisubunit protein, it is not an antibody or 
antibody fragment such as an Fv molecule or Fab-like molecule. Where the 
first protein is an integral membrane protein, it may have one 

30 transmembrane domain, or two or three or more transmembrane domains 
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(ie 5 it may be a poly topic membrane protein). Preferably the first protein is 
an integral membrane protein, and more preferably it has one 
transmembrane domain. Still more preferably the first protein has 2 or 3 or 
4 or 6 or 7 or 12 transmembrane domains. 

5 

The first protein may be any size which, when crystallised, is capable of the 
required accommodation of the second protein. Preferably the first protein 
is bigger than 10 amino acids in total, more preferably bigger than 25, 50, 
75 or 100 amino acids in total. Still more preferably, the first protein is 
10 bigger than 150aa, 200aa, 250aa, 300aa, 400aa, 500aa, 600aa, 700aa, 800aa 
or lOOOaa in total length. By "total length" we mean the total number of 
amino acids in the first protein, including all component subunits where the 
first protein is a multisubunit protein. 

15 It will be appreciated that in order to crystallise the first and second proteins 
together in a crystal lattice of a quality suitable for x-ray diffraction, it may 
be necessary to conduct a screen for suitable and optimal conditions for 
crystallisation. Suitable screens are discussed in more detail below. Such 
screening is routine in the art of crystallisation. A way of determining if a 

20 crystal is capable of diffracting x-rays is to observe the diffraction pattern. 

Of the first and the second protein useful in the present invention, it may be 
determined which protein constitutes the crystal lattice by determining 
which protein contributes the greater proportion of crystal contacts which 
25 are made and maintained within the lattice. 

In preferred forms of the first and second aspects of the invention, the said 
first protein is E. coli cytochrome bo3 or E. coli fumarate reductase, or 
variants thereof. As shown in Fig. 3, the crystals of cytochrome boh 
30 provide space for additional proteins at the C-terminal end of cytochrome 
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bo3 subunit IV. The term "variants thereof, as used herein, is intended to 
mean e.g. polypeptides carrying modifications like substitutions, small 
deletions, insertions or inversions, which polypeptides nevertheless have 
substantially the functional properties of E. coli cytochrome bo3 or E. coli 
5 fumarate reductase. It should be emphasized that the term "functional 
properties", as used in this context, does not refer to biological activity, but 
rather to the structural capability to assist in crystallization of the second 
protein, for example to harbor a second protein in its "available space" in 
order to facilitate crystallization of the said second protein. 

10 

The term "available space" is not to be construed as referring solely to a 
"cavity" or gap" within the crystal lattice of a said first protein. Rather, the 
available space also comprises solvent channels in the said crystal lattice. 
For instance, in the boh oxidase crystal (cf. Fig. 12), big "gaps" are 
15 repeating, which gaps are not isolated but connected by solvent channels. 
In the 6o3-Apo AI crystal (Fig. 12), Apo AI is not staying in one gap, rather 
it extends through multiple gaps connected by solvent channels. 

The second protein may be any second protein which is capable of being 
20 accommodated in the crystal lattice of the first protein. In a preferred 
embodiment, the second protein is smaller than the first protein. By 
"smaller" we include the meaning that the second protein has a lower 
molecular weight than the first protein. A lower molecular weight is any 
mass which is less than that of the first protein. Preferably, the molecular 
25 weight of the second protein is at least lkDa, 5kDa, lOkDa, 15kDa or 
20kDa lower than that of the first protein. More preferably the molecular 
weight of the second protein is smaller than that of the first protein by at 
least about 25kDa, 35kDa, 45kDa or 55kDa or more. Similarly, it is 
preferred if the second protein is not bigger than 150kDa, more preferably 
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no bigger than 125kDa, 1 lOkDa, lOOkDa or 90kDa. The size of the second 
protein may be less than SOkDa or 70kDa. 

Hence, it will be appreciated that the term "available space" is one which 
5 indicates the space in a crystal lattice of a first protein, which space is not 
occupied by the first protein. This space therefore may be occupied by a 
second protein or by solvent molecules and still be referred to as "available 



construed as referring only to a fixed volume within the crystal lattice of a 
said first protein in the absence of the second protein. Instead, the available 
space is one which is flexible and may alter in size and/or shape according 
to the nature of the second protein. In other words, the crystal group 
15 produced by crystallization of a first protein compared with that produced 
by crystallization of a first and second protein may not be the same. 

Hence, in one embodiment of the first and second aspects of the invention, 
the crystal space group of the first protein on its own (ie, when the first 
20 protein is crystallized in the absence of the second protein) may be different 
to that obtained by crystallization in the presence of, or fused to, the second 



It will be appreciated that the second protein need not be fused to the first 
25 protein to allow its crystallization, and the first and second proteins may be, 



space". 



10 



Furthermore, the term "available space" as defined above is not to be 



protein. 



for example, produced as a fusion, cleaved, then crystallized. 
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According to another preferred embodiment of the first two aspects of the 
invention, the first and second proteins are fused to each other. The fusion 
may be "direct" such that the amino acid sequences of the two proteins are 



5 
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contiguous, or "indirect", where the amino acid sequences of the two 
proteins are joined via a linking polypeptide sequence or sequences. 

In a further preferred embodiment of the first and second aspects of the 
invention, the first protein is a multisubunit protein. The subunits may be 
held together by covalent or non-covalent bonds. An example of a 
multisubunit protein is E. coli cytochrome bo3. 

Preferably, the first protein is one which, when crystallized in the absence 
of a second protein, its crystal lattice has solvent filled gaps of a suitable 
size for accommodating a second protein within the crystal lattice, whether 
or not the original space group of the first protein is maintained by 
accommodating the second protein. Hence, it will be appreciated that the 
first protein may be a soluble protein. 



10 
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When the "first protein" is E. coli cytochrome bo3, the nucleotide sequence 
to be included in the recombinant vector of the invention is preferably 
selected from 

(a) the polypeptide coding regions of the nucleotide sequence shown as 
20 SEQIDNO:13; 

(b) nucleotide sequences, which e.g. can be at least 90% or 95% 
homologous, with the nucleotide sequence shown as SEQ ID NO. 13 
in the Sequence Listing, and which are capable of hybridizing, under 
stringent hybridization conditions, to a nucleotide sequence 

25 complementary with the polypeptide coding regions of the nucleotide 

sequence as defined in (a); and 

(c) other nucleic acid sequences encoding the same amino acid 
sequences as those defined in (a) or (b). Numerous such nucleotide 
sequences may be designed due to the degeneracy of the genetic 

30 code. 
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The term "stringent hybridization conditions" is known in the art from 
standard protocols (e.g. Ausubel et aL, supra) and could be understood as 
e.g. hybridization to filter-bound DNA in 0.5 M NaHP0 4 , 7% sodium 
5 dodecyl sulfate (SDS), 1 mM EDTA at +65C, and washing in 0. lxSSC / 
0.1%SDSat+68C. 

It should thus be understood that the nucleotide sequence coding for 
cytochrome bo3, or a suitable variant thereof, is not limited strictly to the 

10 sequence shown as SEQ ID NO: 13. Rather this sequence is represented in 
DNA molecules carrying modifications like substitutions, small deletions, 
insertions or inversions, which nevertheless encode polypeptides having 
substantially the functional properties of E. colt cytochrome bo3. As 
mentioned above, the term "functional properties", does not in this context 

15 refer to biological activity, but rather to the structural capability to harbor a 
second protein in its "available space" in order to facilitate crystallization of 
the said second protein. 

The promoter sequence to be included in the recombinant vector may be the 
20 one naturally associated with the DNA sequence encoding the first protein, 
or of another origin. When the said first protein is E. coli cytochrome bo3, 
the said promoter sequence could essentially comprise the cytochrome bo3 
promoter sequence shown as positions 203 through 803 in SEQ NO: 1. 
Alternatively, the said promoter could be an inducible promoter, such as the 
25 promoter pBAD (Invitrogen Corp., CA, USA) as shown in Example 7, 
below. 
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In a preferred form of the invention, the recombinant vector can further 
comprise a nucleotide sequence encoding a linker amino acid sequence, 
30 facilitating for the said first and second proteins to be expressed as a fusion 
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protein. The ideal linker should have the appropriate length and flexibility 
so as to allow the second protein to be positioned in the available space of 
the crystal lattice, it should not form hydrophobic interactions with 
lipophilic structures such as host cell membranes or the protein core, it 

5 should not affect, in a negative way, the expression, translocation and 
folding of the fusion protein, it should not inhibit the functions of the first or 
second proteins and should be stable in the host cell and during purification. 
In addition, the linker may comprise sequences useful for the detection 
and/or purification of the fusion protein by means of affinity methods, 

io especially when useful antibodies are unavailable. The said linker amino 
acid sequence can e.g. be a Strep-tag having an amino acid sequence shown 
as SEQ ID NO: 6 or a Strep-HA-t&g, having an amino acid sequence shown 
as SEQ ID NO: 9. Preferably, the said linker amino acid sequence will be 
adapted to facilitate, upon expression of the said first and second proteins, 

15 for the said second protein to be positioned in the said available space in the 
lattice of the first protein. Thus, when the said first protein is E. coli 
cytochrome bo3, the said nucleotide sequence coding for a linker amino 
acid sequence can preferably be positioned at the 3'-end of the nucleotide 
sequence coding for E. coli cytochrome bo3 subunit IV. 

20 

As shown in Example 3, below, the recombinant vector according to the 
invention can in addition comprise a nucleotide sequence encoding a 
"protein Z" polypeptide having essentially an amino acid sequence shown 
as SEQ ID NO: 14. Since protein Z is a highly soluble and stable protein 
25 domain, its presence may facilitate the expression, solubilization, 
purification and crystallisation of the fusion protein. 

The recombinant vector according to the invention may in addition 
comprise a nucleotide sequence encoding an affinity tag, e.g. a His-tag, 
30 useful for detection and or purification of the expressed protein(s). An 
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affinity tag is a protein sequence that provides a defined, strong, and highly 
specific non-covalent binding interaction to a ligand or another protein 
sequence or domain. The presence of the affinity tag in the expressed 
protein allows the detection and/or purification of said protein on the basis 
5 of a reversible interaction between said affinity tag and its specific ligand, 
said specific ligand being attached to an easily detected chemical entity e.g. 
a fluorophore or an enzyme) or chromatographic matrix, respectively. 
When the first protein is E. coli cytochrome bo3, the affinity tag could e.g. 
be attached to the nucleotide sequence encoding subunit II of E. coli 
10 cytochrome bo3 (cf positions 1746 through 1772 in SEQ ID NO: 1). 

In a further important aspect of the invention, the recombinant vector 
defined above further comprises a nucleotide sequence encoding the said 
"second" protein. Preferably, the second protein is one as described above. 
15 When cytochrome bo3 is the first protein, it is estimated that the "cavity" in 
the cytochrome bo3 crystal (cf. Fig. 3) can harbor a protein having a 
molecular mass up to approximately 100 kDa. 

As explained above, the presence of a second protein in a crystal of the first 
20 protein may alter the space group of the first protein from that which is 
formed when the first protein is crystallized in the absence of the second 
protein. Hence, it will be appreciated that the "cavity" in the cytochrome 
bo3 crystal may in fact be capable of harbouring a protein larger than 
lOOkDa. Hence, due to the flexibility of the crystal space group, the 
25 predicted size of the "cavity" is not to be considered limiting in the choice 
of second protein. 

Consequently, at least when cytochrome bo3 is the first protein, the said 
second protein preferably has a molecular mass below 100 kDa, such as 
30 below 75 kDa, below 60 kDa or below 50 kDa. The skilled person will be 
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able to determine the possible size of the second protein, depending on the 
crystal lattice structure and available positions for the attachment of fusion 
partners in the first protein to be used. Furthermore, the second protein 
must be expressed in the system used in a correctly folded form and be able 
5 to translocate within the host cell in a manner consistent with the intended 
subcellular location and orientation of the fusion protein. In addition, the 
function of the second protein should be maintained when expressed in the 
system used, so as to allow a functional assay to be performed, 
demonstrating that the second protein is in its native or native-like form. 

In a preferred embodiment of the invention, the second protein is a 
membrane protein, and more preferably, it is an integral membrane protein. 
Such an integral membrane protein may have any number of 
transmembrane domains, ranging from one to twelve or more. 

Preferably, the second protein has a lower molecular weight compared to 
the first protein. 

Included in the invention is also a cultured host cell, e.g. an E.coli cell, 
20 harboring a recombinant vector according, to the invention, in particular a 
recombinant vector comprising a nucleotide sequence encoding for a said 
second protein. 

A further aspect of the invention is a process for the production of a fusion 
25 protein comprising culturing a host cell as defined above, under conditions 
whereby the said fusion protein is produced, and recovering the said fusion 
protein. A fusion protein obtained, or obtainable by this process is included 
in the invention. 
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In the case that the recombinant protein is a multisubunit complex kept 
together by non-covalent forces the nucleic acid sequences encoding the 
individual subunits of said complex may be introduced into the host 
organism by use of more than one vector, each vector encoding one or more 
5 of these subunits. 

A further aspect of the invention provides a fusion protein comprising (i) a 
first protein which is a membrane protein or multisubunit protein and 
which, when crystallized with a second protein, is capable of 
10 accommodating the second protein in the crystal lattice and (ii) a second 
protein to be located, when crystallized, in the crystal lattice of the first 
protein wherein the resulting crystal lattice is capable of diffracting x-rays. 

In yet another aspect, the invention provides a fusion protein comprising (i) 
15 a first protein which is a membrane protein or multisubunit protein and 
which upon crystallization yields crystals having available space in the 
lattice, so as to allow for the ordered packing of a second protein into the 
said available space; and (ii) a second protein to be accommodated, upon 
crystallization, in the said available space wherein the resulting crystal 
20 lattice is capable of diffracting x-rays. 

When the said first protein is E. coli cytochrome bo3, the said second 
protein is preferably attached to subunit IV of E. coli cytochrome 6o3. 

25 In a preferred embodiment, either or both of the proteins comprised in the 
fusion proteins of the invention are membrane proteins. In other words, 
either the first or second proteins, or both of them, are membrane proteins. 



By "membrane protein" we include membrane associated proteins, 
30 membrane inserted proteins (such as those which possess a hydrophobic 
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domain which is resident within a membrane but may not completely span 
the membrane bilayer) and integral membrane proteins where the protein 
possesses at least one transmembrane domain which spans the membrane 
bilayer, such as single spanning integral membrane proteins and polytopic 
5 ' membrane proteins. Preferably, the membrane protein is an integral 
membrane protein. 

More preferably, the first protein is one as defined above in relation to the 
first and second aspects of the invention. 

10 

In a further aspect, the invention provides a method for crystallization of a 
protein, comprising (i) obtaining a fusion protein comprising (I) a first 
protein, which is a membrane protein or multisubunit protein and which 
upon crystallization yields crystals having available space in the lattice, so 
15 as to facilitate crystallization of a second protein; and (II) the said (second) 
protein to be crystallized; and (ii) crystallizing the said fusion protein 
wherein the resulting crystal is capable of diffracting x-rays. 

The said fusion protein could preferably be obtained by the processes 
20 described above, in particular by expression of the recombinant vectors 
according to the invention. 

Preferably the first protein is a membrane protein, and more preferably it is 
an integral membrane protein. 

25 

A still further aspect of the invention provides a method for crystallization 
of a protein, comprising (i) obtaining a first protein which is an integral 
membrane protein and which upon crystallization yields crystals having 
available space in the lattice, so as to facilitate crystallization of a second 
30 protein; and (ii) obtaining the second protein to be crystallized; and (iii) 
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crystallising both the said proteins together wherein the resulting crystal is 
capable of diffracting x-rays. 

The first and second proteins of this aspect could be produced as a fusion 
protein, preferably obtained by the processes described above, in particular 
by expression of the recombinant vectors according to the invention. 

In an embodiment of this aspect of the invention, the proteins could be 
obtained by expressing the first and second proteins in two separate 
expression systems and fusing them after purification, either by simply 
mixing the two purified protein samples or by soaking that second protein 
into crystals of the first protein. It will be appreciated that in order to 
introduce the second protein into the solvent gap of the crystals of the first 
protein by soaking it is necessary to have a means of targeting the second 
protein to the precise location within the crystal lattice of the first protein, 
i.e. some form of protein-protein interaction. This can be achieved using 
high affinity domains engineered into suitable sites within the proteins, 
which when a suitable concentration of the second protein is added to the 
crystals of the first protein, allows the proteins to form a complex based on 
the affinity of the two domains. 

Preferably, the first protein useful in the methods of crystallisation 
according to the present invention is as defined above in respect of the 
recombinant vectors of the invention. 

In a preferred embodiment of the crystallisation methods of the invention, 
the second protein is as defined according to the first or second aspects of 
the invention. 
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The second protein may be a soluble protein, a membrane associated protein 
or an integral membrane protein. The method is particularly useful where 
the second protein is an integral membrane protein. Hence it is preferred if 
the second protein is an integral membrane protein. 

In a preferred embodiment of the crystallisation method aspects of the 
invention, the method further comprises a step wherein at least two 
detergents are screened in the crystal growth conditions to identify which 
one optimises the growth and/or diffraction of the resulting crystals. It is 
known that detergent selection is important to obtain well diffracting 
crystals. In the case where cytochrome bo3 is the first protein, a wide range 
of detergents are tolerated, providing a broad choice of detergent in order to 
maximise the resolving ability of the crystals produced. 

15 Suitable detergents for screening include all detergents of the C7-C9 range. 
Preferably, one of the detergents screened is octylglucoside. 

Optimal growth of the crystals is that which gives a smaller number of 
crystals, preferably a single crystal, which are large. Preferably the 
20 resulting crystal is assessed by its diffraction pattern, with those crystals 
which produce a diffraction pattern being preferred to those which do not. 

In a further preferred embodiment of these crystallisation method aspects, 
the pH of crystallisation is optimised for crystal growth. The pH screening 
25 may be performed between a range of pH 6-8. Typically, an initial screen 
to optimise pH of crystallisation may test pH values of 6, 6.5, 7, 7.5 and 8. 
Preferably, this embodiment comprises a further screen for an optimised pH 
wherein the optimal pH value identified by the initial pH screen is tested to 
a finer degree. For example, where pH 6.5 is identified as optimal in the 
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first screen, a subsequent pH screen may test pH values of 6.3, 6.4, 6.5, 6.6, 
6.7 and 6.8. 

According to a yet further preferred embodiment, the optimisation of 
5 crystallisation pH is performed in addition to a screen to identify a detergent 
which optimises the growth of well-diffracting crystals. 

When the second protein is a hydrophobic protein, such as an integral 
membrane protein, an altered pH and a higher PEG concentration (relative 
10 to the conditions employed for crystallization of the first protein alone) may 
be necessary. 

PEG acts as a precipitant in crystallisation, and acts to alter the protein- 
solvent or protein-protein contacts so that the protein molecules precipitate 
15 out of solution, preferably as ordered crystals. Other precipitants are known 
to be useful in crystallisation, including, for example, ammonium sulphate 
and 2-methyl-2,4-pentanediol (MPD). Alternative precipitants are give in 
Bergfors (1999) in Protein Oystallization Ed. Terese Bergfors, 
International University Line pp 41-50. 

20 

It will be appreciated that the crystal space group of the first protein alone 
(ie, in the absence of the second protein) may be different to that obtained 
by crystallization of the first protein with the second protein. 

25 An additional aspect of the invention provides a method of obtaining 
structural data on a protein of interest comprising the steps of 
(i) obtaining the protein of interest; 



30 



(ii) crystallising said protein in the crystal lattice of another 
protein, which crystal lattice is able to accommodate the 
protein of interest; and 
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(iii) obtaining x-ray diffraction data from the crystal produced in 
step (ii). 

The crystallisation method used in step (ii) may be any suitable method. 
5 Preferably it is a method according to the present invention as described 
above. 

The protein of interest may be obtained by any convenient method. 
Advantageously, the protein of interest is obtained by expressing a 
10 recombinant vector according to the present invention or by culturing a cell 
according to the invention. 

The protein of interest may be any protein for which structural data is 
desired. Preferably, the protein of interest is one according to the definition 
15 of the "second protein" given above. More preferably, the protein of 
interest is an integral membrane protein. 

Typically, the x-ray diffraction data is obtained to a level of resolution 
which can yield structural information. This resolution may vary according 
20 to the detail of structural information required. Preferably, the resolution is 
at least 6A, more preferably at least 5 A, still more preferably at least 4, 3.5, 
3.2, or 3, or2.5A. 

The present invention further provides a use of a recombinant vector, or a 
25 cell, according to the invention, in a method of obtaining structural data on 
a protein of interest according to the invention. 



In yet a further aspect, the invention provides a process for the production 
of a recombinant vector according to the invention comprising 
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(i) obtaining a recombinant vector comprising (I) a nucleotide sequence 
encoding a first protein which is a membrane protein or 
multisubunit protein and which, when crystallized with a second 
protein, is capable of accommodating the second protein in the 
5 crystal lattice and (II) a promoter operably linked to the said 

nucleotide sequence; and 
(ii) introducing, into the said vector, nucleotide sequences facilitating 
the insertion of further nucleotide sequences wherein the resulting crystal 
would be capable of diffracting x-rays. 

10 

The nucleotide sequences may be sequences including a restriction 
endonuclease cleavage site. 

An additional aspect of the invention provides a process for the production 
15 of a recombinant vector according to the invention, comprising 

(i) obtaining a recombinant vector comprising (I) a nucleotide sequence 

encoding a first protein which is a membrane protein or multisubunit 
protein and which upon crystallization yields crystals having 
available space in the lattice, so as to allow for the ordered packing 
20 of a second protein into the said available space, and (II) a promoter 

operably linked to the said nucleotide sequence; and 

(ii) introducing into the said vector, nucleotide sequences facilitating the 

insertion of further nucleotide sequences wherein the resulting 
crystal would be capable of diffracting x-rays. 

The further nucleotide sequences are preferably sequences encoding at least 
one further protein which is the protein to be crystallized. 

The said recombinant vector obtained in step (i) could e.g. be the vector 
30 designated pMB908 (disclosed as "pJRhisA" by Rumbley et al (1997) 
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Biochimica et Biophysica Acta 1340, 131-142 which comprises the 
nucleotide sequence shown as SEQ ID NO: 1 . 

The nucleotide sequences may be sequences including a restriction 
5 endonuclease cleavage site. 

EXPERIMENTAL METHODS 

The modified vector constructs were generated using standard methods, 
io such as molecular cloning methods, PCR, restriction analysis, DNA 
preparative methods, etc. In this context, the term "standard methods" is to 
be understood as referring to protocols and procedures found in an ordinary 
laboratory manual such as: Current Protocols in Molecular Biology, editors 
F. Ausubel et al, John Wiley and Sons, Inc. 1994, or Sambrook, J., Fritsch, 
15 E.F. and Maniatis, T., Molecular Cloning: A laboratory manual, 2nd Ed., 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 19S9. 

Large-scale production of the cytochrome bo3 fusion protein was performed 
in E. coli, using a lOL-fermentor. E. coli cells were grown in LB 
20 supplemented with 1 g KH2P0 4 , 14.6 g K2HP0 4 , 20 ml Na-lactate, 0.5 ml 
1 M MgS0 4 , 0.00214 g CuS0 4 , 0.0102g vitamin Bl, 0.0098 g Nicotinic 
Acid and 0.05 g ampicillin/L over a period of 8 hrs (Georgiou et al (1988) 
Biochim. Biophys. Acta 933, 179-183). 

25 EXAMPLES OF THE INVENTION 

EXAMPLE 1 : Crystallization and structural determination of cytochrome 
bo3 
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The structure of cytochrome bo3 from Escherichia coli was determined. 
Plasmids encoding a carboxy-terminus histidine-tag on subunit II of 
cytochrome bo3 ubiquinol oxidase were cloned into an E. coli strain, 
GO 105, lacking terminal oxidases as described in Kaysser et aL (1995) 

5 Biochemistry 34, 13491-13501. Purified protein was crystallized using 
polyethylene glycol 1500. Data collection from the crystal was 
performed at ID14/EH3 of the ESRF. Image data was processed up to 3. 5 A 
resolution (Fig.l) with an R raerge value of 10.8% (for F>1.0 (F)). The 
crystals belong to the space group C2221 with unit cell dimensions of 

10 a=92.1 A, b=372.5 A and c=232.7 A. The asymmetric unit contained two 
molecules of ubiquinol oxidase. The bo3 protein has a molecular weight of 
144 kDa and occupies about 41% of the volume of the unit cell. 



15 



EXAMPLE 2: Expression vector constructs 



A multiple cloning site (Notl, Sad, Mid and Xbal) was added to the S'-end 
of subunit IV of the cytochrome bo3 scene. A plasmid designated pMB908 
(SEQ ID NO:l), which comprises a cytochrome bo3 construct with a His9 
tag at the C-terminus of subunit II was used as starting material. The 

20 plasmid pMB908 is identical to the pJRhisA plasmid described by Rumbley 
et al. (1997) Biochimica et Biophysica Acta 1340, 131-142. The addition 
of unique restriction sites to the carboxy-terminus to subunit IV in pMB908 
was performed by the polymerase chain reaction (PCR) method of splicing 
by overlap extension. The method entails the use of four different primers, 

25 two encompassing the entire sequence to be changed, one on either end of 
the new construct (the 5-primer is set forth as SEQ ID NO: 2 and the 3- 
primer as SEQ ID NO; 3), and two centrally positioned primers (The 5- 
primer is set forth as SEQ ID NO: 4 and the 3'-primer as SEQ ID NO: 5) 
containing the sequence for the restriction sites (Notl, Sacl, Mlul, Xbal) 

30 giving rise to two fragments containing overlapping sequences. The PCR 
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reactions are carried out in two steps to yield a 2.4 kb fragment of the 
cytochrome bo3 gene which was flanked by two unique restrictions sites 
(Nsil and Sphl). This fragment was sequenced and then heated into the 
cytochrome bo3 gene generating the construct MB930 (Fig. 4). 

Two sets of linkers were generated to act as bridge sequences between 
subunit IV and the foreign fusion protein. A short olioonucleotide linker, 
coding for the Strep-tag, a nonapeptide having the sequence AWRHPQFGG 
(SEQ ID NO: 6) was formed by annealing two single-stranded, synthetic 
oligonucleotides (The forward-strand oligonucleotide is set forth as SEQ ID 
NO: 7 and the reverse strand oligonucleotide as SEQ ID NO: 8), coding for 
this sequence and containing a 5'-Notl site and 3-MM site. The Strep-teg is 
an amino acid sequence which was identified using phage display based on 
its affinity for streptavidin (Schmidt & Skerra (1993) Protein Engineering 6, 
109-122). A second linker (Strep-HA-tag; AWRHPQFGGYPYDVPDYA) 
(SEQ ID NO: 9) coding for both the Strep-tag and the hemaglutinin (HA)- 
tag YPYDVPDYA (SEQ ID NO: 10) (Kast et al. (1996) J. Biol Chem. 
271(16), 9240-9248), was made by annealing, linear, single-stranded, 
synthetic oligonucleotides (The forward-strand oligonucleotide is set forth 
as SEQ ID NO: 11 and the reverse-strand oligonucleotide as SEQ ID NO: 
12). The oligonucleotide cassettes were generated by mixing lOnmol of 5- 
and 3'-oligonucleotide with 30 1 annealing buffer (500 MM NaCI, 100 nM 
Tris-HCl, pH 7.4, and 100 nM MgCl 2 ) in a total volume of 300 1. The 
samples were boiled for two minutes and then allowed to cool to 
approximately +30C prior to storage at -20C. The linkers were cloned into 
the Noil /Mlul sites of the construct pMB930, yielding constructs pMB946 
and pMB947 coding for the Strep and Strep-HA linker sequences, 
respectively. 
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The expression of all the modified cytochrome bo3 constructs was assessed 
in two ways. Firstly, each construct was transformed into the E. coli strain 
GO 105 Kaysser et al. (1995) Biochemistry 34:13491-13501. These cells 
lack an endogenous oxidase activity and will only grow under aerobic 
conditions after the introduction of a functional oxidase. All the vector 
constructs produced GO 105 cell colonies, indicating that the additions to 
the sequence had not altered the function of the cytochrome bo3. 
Secondly, a His-tag was present at the C-terminal end of subunit II of the 
cytochrome bo3, and in pMB947 there was a HA-tag at the C-terminal end 
of subunit IV. It was thus possible to assess the expression of the 
constructs using Western blot analysis (Fig. 5) with antibodies directed 
against these specific sequences. 

EXAMPLE 3: Cytochrome bo3 - Protein Z fusion protein 

The polypeptide designated "Protein Z" or "Domain Z" (Nilsson et al. 
(1987) Prot. Eng., 107-13; SEQ ID NO: 14) is a modified analogue of the 
IgG-binding domain B of Staphylococcus aureus protein A (SPA). 
Protein Z (6.6 kDa) has been extensively used as an affinity tag for 
reviews see Nilsson et al. (1992) Cun\ Opin. Struct. Biol. 2: 5 69575; 
and LaVallie McCoy (1995) Curr. Opin. Biotech 6: 501-506. The 
structure of SPA domain B has been resolved to 2.8A (Deisenhofer (1981) 
Biochemistry 20, 236170). 

Protein Z was cloned into the Notl and Mlu/ sites of the cytochrome bo3 
fusion vector using standard methods yielding pMB1048. Expression of 
the cytochrome bo3 fusion construct was assessed by Western blot 
analysis and cells expressing the fusion protein were grown in a fermentor 
(OD 5 5 0 =3.0), harvested and stored at -80C. Membranes were purified by 
the following method: Cells from a lOL culture were taken to a final 
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volume of 1L in lysozyme treatment buffer (200 mM Tris-HCl, pH 8.8, 20 
mM EDTA, pH 8.0, and 500 mM sucrose), lysozyme was added to a final 
concentration of 0.1 % and the cells stirred for 30 min. The cells were 
pelleted by centrifugation at 8,000 rpm for 20 min. The pellets were 

5 resuspended in approx. 750 ml cell disruption buffer (5 mM EDTA, pH 
8.0, 10 M PMSF, 10 mM MgCl 2 and several crystals of DNAse I) and 
stirred on ice for 15 min. The solution was sonicated on burst mode for 2 
x 3 min. Unbroken cells were separated by centrifugation at 6000 rpm for 
20 min and the supernatant centrifuged 45,000 rpm for 1 h. The 

.0 membrane pellets were resuspended in a minimal volume of buffer (20 
mM Tris-HCI, pH 7.5, 300 mM NaCl and 2.5 mM imidazole). At this 
point it was possible to freeze the membrane pellets at -80C prior to 
solubilization and purification. 

15 The membranes, were solubilized in 1 % dodecylmaltoside for lh, +4C. 
Solubilized protein was harvested after ultracentrifugation at 100,000 x g 
for 1 h and applied to a 65ml bed volume Ni-NTA column (Qiagen) 
equilibrated with 20 mM Tris-HCl, pH 7.5, 300 mM NaCl, 5 mM 
imidazole and 0.03% dodecylmaltoside. The column was washed with 

20 three bed volumes of the equilibration buffer to remove any non-specific 
binding proteins and then the sample was eluted with a linear 5-150 mM 
imidazole gradient. The resulting chromatograms showed two clear 
protein peaks, which were termed "low" and "high" imidazole based on 
the elution concentration. 

25 

The fractions from the two peaks were pooled separately. The buffer was 
then exchainged for 20 mM Tris-HCl, pH 7.5 containing 0.03% dodecyl 
maltoside using an Amicon stirred ultrafiltration cell (100 kDa cutoff 
filter) (Amicon, MA, USA). The two pools were then applied separately 
30 to an anion exchange column, MonoQ 10/10 (Amersham Pharmacia 
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Biotech, Sweden) in the presence of 1 mM potassium ferricyanide after 
equilibrating the column with 20 mM Tris-HCl, pH 7.5 containing 1% 
octylglucoside. The column was washed slowly with 6 bed volumes of 
equilibration buffer and then eluted with a 0-600 mM NaCl gradient. 
5 Fractions containing the fusion protein were pooled on the basis of 
spectrophotometric readings. Using a Centricon ultrafiltration cell (100 
kDa cutoff) (Millipore, MA, USA), the buffer was exchanged for 20 mM 
Tris-HCl, pH 7.5 containing, 1% octylglucoside, and the sample was 
concentrated to 20 mg protein/ml. 

10 

Crystals for this fusion protein (Fig. 6) were obtained for "low" and "high" 
imidazole protein, using the hanging drop vapor diffusion technique. The 
protein solution contained 20 mM Tris-HCl, pH 7.5, and 1% octylglucoside. 
A reservoir solution of 9-10% PEG 1500, 100 mM NaCl, 100 mM MgCl 2 
15 and 5% ethanol was used. The protein solution was mixed in a 1: 1 ratio 
with the reservoir solution and left to equilibrate at +4C. These data 
demonstrate that the site at the C-terminal end of subunit IV can be used to 
accommodate foreign proteins as fusion partners without inhibiting crystal 
formation. Crystals of cytochrome 603 + Z, were obtained under similar 

20 conditions to the native cytochrome bo 3 (Abramson et al 9 2000), although 
the crystals of the fusion protein grew over a wider pH range (6-8) 
compared to the native protein (pH 7-7.5). In addition it was possible to 
grow crystals using protein from both the "low" and "high" imidazole 
peaks, compared to the native protein which only yielded reproducible 

25 crystals from the "low" imidazole peak. The crystals of the fusion protein 
from both "low" and "high" imidazole peaks grow as square plates, a more 
regular shape compared to the rod-like crystals of the native cytochrome 
bos. Data was collected from a cytochrome 603 + protein Z fusion protein 
crystal which diffracted X-ray up to 6 A and had a space group P2 \ with the 
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cell dimensions a=93.4 A, b= 328.7 A, c= 131.1 A and *=92.1 with four 
molecules in the assymetric unit. These data are in contrast with the wild- 
type protein crystals which have a space group C222\ with the cell 
dimensions a=91.3 A, Z>= 370.3 A, c= 232.4 A (see example 1). 



EXAMPLE 4: Cytochrome bo3 - GPCR fusion proteins 

Two G-protein coupled receptors (GPCRs); the human muscarinic 1 (Ml) 
receptor of 51 kDa (Allard et dl. (1987) Nucleic Acids Res 15: 10604) and 
the human canabinoid 2 (CB2) receptor of 40 kDa (Munro et ah (1993) 
Nature 365, 61-65) were cloned into the Mlul and Xbal sites of the 
cytochrome bo3 fusion vectors according to standard methods. The 
receptors expressed as fusion partners with subunit IV of the cytochrome 
6o3(Fig. 7). 

EXAMPLE 5: 

'Cytochrome bo3 - Leader peptidase fusion protein and cytochrome bo3 
ProW fusion protein. 



10 



15 



20 A fusion construct of cytochrome bo3 + E. coli leader peptidase 
(Wliitchurch & Mattick (1994) Gene 150: 9-15) (no tag sequence in vector) 
was obtained as described above. The biological role of the E. coli leader 
peptidase (36 kDa) is to remove amino-terminal leader peptides from 
exported proteins after they have crossed the plasma membrane. The 

25 enzyme of 323 amino acid residues spans the membrane twice, with its 
large carboxy-terminal domain protruding into the periplasm (for a review, 
see Dalbey (1991) Mol. Microbiol. 5, 2855-2860). 
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A fusion construct of cytochrome bo3 + E coli ProW (Growshiker (1989) J. 
Bacterid. 171: 1923-193 1) (+ Strep-tag) was obtained as described above. 
ProW is an E. coli inner membrane protein of 38 kDa that consists of a 100- 
residue-long periplasmic N-terminal tail followed by seven closely spaced 
5 transmembrane segments (Cristobal et al. (1999) J. Biol. Chem. 274, 
20068-20070). It is part of the ProU system, a member of the ATP-binding 
cassette (ABC) superfamily of transporters (Lucht & Bremer (1994) FEMS 
Microbiology Letters 14: 3-20). Both fusion constructs were shown to 
express (Fig. 8). 

10 

EXAMPLE 6: Cytochrome bo3 - Apo AI fusion protein 

15 Apolipoprotein AI (Apo AI, Sharpe et al. (1984) Nucleic Acids Res. 12: 
3917-3932) is the major protein component (28 kDa) of the serum high- 
density lipoprotein (HDL) particles (for a review, see Hargrove et al. (1999) 
J. MoL Endocrinol. 22, 103-111). The structure of truncated human Apo 
AI has been determined at 3A resolution (Borhani et al. (1999) Acta Ciyst. 

20 D55: 12291-12296). 

The following fusion constructs of cytochrome bo3 and Apo AI were 
generated: 

pMB1241 Cytochrome bo3 + Apo AI (no tag) 
25 pMB1242 Cytochrome bo3 + Strep-tag + Apo AI 

pMB1243 Cytochrome bo3 + Strep-HA-tag + Apo AI 
pMB 1 244 Cytochrome bo3 + Protein Z + Apo AI 



Interestingly, the expression of these fusion proteins was different from the 
30 earlier expressed proteins (Fig. 9). No expression of cytochrome bo3 + 
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Sfrep-HA-tag + Apo AI (pMB1243) could be detected. The cytochrome 
bo3 + Protein Z + Apo AI (pMB1244) exhibited the highest level of 
expression, although it underwent a certain amount of proteolytic 
degradation. Cytochrome bo3 + Apo AI (no tag) exhibited detectable levels 

5 of expression (pMB1241) although it too degrades (50% loss). The 
cytochrome bo3 + Sti'ep-tag + Apo AI (pMB1242) construct expresses at 
relatively high levels and does not appear to be degraded. In addition, the 
fusion proteins only appear to express satisfactorily to an OD600 of 
approximately 1, after which the whole cytochrome bo3 Apo AI complex 

10 appears to be broken down. These results show the individuality of proteins 
expressed in this system and highlight the need to characterize the growth 
and production of each fusion protein. Optimizing the construct as well as 
the host system and cultivation conditions with regard to the properties of 
the target protein may be performed by a person skilled in the art as shown 

15 in the current example. 

It was possible to obtain crystals of pMB1242, cytochrome bo3 + apo A-L 
These had an elongated hexagonal plate form. In addition, these crystals 
were also thicker and had sharper edges than the native crystals, indicating 

20 ordered packing within the crystal lattice. These crystals diffracted to 5 A 
and belong to the space group C2 with the cell dimensions a=93.4 A, b= 
328.7 A, c= 131.1 A and *=92.1 and two molecules per assymrnetric unit. 
Crystals of pMB1246, cytochrome bo3 + ProW, were obtained under 
slightly different conditions; PEG 1500 concentration was 18% andpH 6.5. 

25 These crystals diffracted to 6 A and belong to the space group C2 with the 
cell dimensions a=93.4 A, b= 328.7 A, c= 131.1 A and *=92.1 and two 
molecules per assymrnetric unit. The wide-belt like loop of the 
apolipoprotein circles the whole cytochrome bo3 molecule and forms 
protein-protein contacts within the lattice. It is possible that given the 

30 function of apo A-I in vivo, it is binding to the detergent micelle which 
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surrounds the hydrophobic portions of cytochrome bo3. The apolipoprotein 
A-I appears to be accommodated within multiple gaps in the crystal lattice 
which are connected by solvent channels. 

5 EXAMPLE 7: Expression of cytochrome bo3 under the control of an 
inducible promoter. 

The constitutive cytochrome bo3 promoter was replaced with an inducible 
pBAD promoter cloned using the pBADHis vector (Invitrogen Corp., CA, 

10 USA) as a template. The pBAD Expression System is based on the araBAD 
operon, which controls the arabinose metabolic pathway in E. colL This 
construct was expressed in an alternative cell line, GLIOl (Rumbley et aL 
(1997) Biochem. Biophys. Acta 1340: 131-142). These cells express a 
form of oxidase and thus will grow under aerobic conditions in the absence 

15 of bo3. The expression of cytochrome bo3 was induced with increasing 
concentrations of arabinose, once the cells reached an OD600 of 0.5. 
Maximum expression of the cytochrome bo3 construct was observed at 
0.2% arabinose with higher concentrations causing no further significant 
increase in expression. 



The expression of cytochrome bo3 under control of the inducible promoter 
(pMBl 127) was higher than that under control of the constitutive promoter 
(pMB908) (Fig. 10). Time course studies showed that the maximum 
detectable expression took place within 3-4 hours post induction (Fig. 11). 



pBAD expression vectors were (generated for cytochrome bo3 with the 
MCS, the Streptag, the Strep-HA-tag and protein Z yielding plasmids 
pMB1271, pMB-1128, pMB1272, pMB1270 respectively. Nucleotide 
sequences coding for polypeptides such as e.g. Apo AI or the CB2 receptor 



20 



25 
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can be cloned into these vectors, and the expression of such polypeptides 
under control of the inducible promoter can be determined. 

EXAMPLE 8: Use of fumarate reductase as a scaffold molecule 

5 

The E. coli respiratory enzyme fumarate reductase (FRD) is a four-subunit 
protein with a molecular mass of 121 kDa, which catalyzes fumarate 
reduction to succinate using membrane-bound menaquinol in anaerobic 
respiration (Kroger (1978) Biochim. Biophys. Acta 505, 129-145; Cole et 
10 al (1985) Biochim. Biophys. Acta 811, 381-403). 

Recently, the structure of FRD has been solved to 3.3 A (Iverson et al 
(1999) Science 284, 1961-1966) and subsequently to 2.8 A (Tina Iverson, 
personal communication). In similarity with cytochrome bo3 y the crystal 

15 lattice of FRD incorporates a gap, which could be exploited for 
crystallization of heterologous proteins. Two subunits of FRD, FrdC and 
FrdD, have three transmembrane helices and it would be possible to make 
fusion proteins at the C-terminal of these subunits using similar constructs 
as those described for cytochrome bo3. The expression level of FRD is 

20 very high under anaerobic conditions and purification is facilitated by the 
fact that the protein is selectively solubilized by the nonionic detergent 
Thesit (Maklashina (1998) J. Bacterid 180, 5989-5896), also called 
Polidocanol. The same restriction sites at the 3 '-ends of FrdC and FrdD as 
for cytochrome bo3 subunit IV are generated, to allow simple transfer of 

25 nucleic acid sequences encoding the target proteins between scaffold 
molecules. 
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CLAIMS 

1. A recombinant vector comprising, (i) a promoter sequence and (ii) a 
nucleotide sequence encoding a first protein which is a membrane 

5 protein or multisubunit protein and which, when crystallized with a 

second protein, is capable of accommodating the second protein in the 
crystal lattice; said recombinant vector further allowing for the insertion 
of a further nucleotide sequence encoding a second protein to be located, 
when crystallized, in the crystal lattice of the first protein wherein the 

10 resulting crystal lattice is capable of diffracting x-rays. 

2. A recombinant vector comprising, (i) a promoter sequence and (ii) a 
nucleotide sequence encoding a first protein which is a membrane 
protein or multisubunit protein and which upon crystallization yields 

15 crystals having available space in the lattice, so as to allow for the 

ordered packing of a second protein into the said available space; said 
recombinant vector further allowing for the insertion of a further 
nucleotide sequence encoding a second protein to be accommodated, 
upon its crystallization, in the said available space in the lattice of the 

20 first protein wherein the resulting crystal lattice is capable of diffracting 

x-rays. 

3. A recombinant vector according to Claim 1 or 2 wherein the x-ray 
diffraction is to a resolution of at least 5 A. 

25 

4. A recombinant vector according to Claim 3 wherein the diffraction 
resolution is at least 4 A. 



5. 

30 



A recombinant vector according to any one of Claims 1 to 4 wherein the 
first protein is a fusion partner of the second protein. 
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6. A recombinant vector according to Claims 1 to 5 wherein the crystal 
space group of the first protein when crystallised alone may be different 
to that obtained by crystallisation with the second protein. 

5 

7. A recombinant vector according to any one of Claims 1 to 6 wherein the 
said nucleotide sequence encoding a first protein is a sequence encoding 
a multisubunit protein. 

10 8. The recombinant vector according to any one of Claims 1 to 7 wherein 
the said nucleotide sequence encoding a first protein is a sequence 
encoding a membrane protein. 

9. The recombinant vector according to any one of Claims 1 to 8 wherein 
15 the said nucleotide sequence encoding a first protein is a sequence 

encoding an integral membrane protein. 

10. The recombinant vector according to Claim 9 wherein the integral 
membrane protein has one transmembrane domain. 

20 

1 1 . The recombinant vector according to any one of Claims 1 to 10 wherein 
the size of the first protein encoded by the nucleotide sequence is more 
than 10 amino acids in total. 

25 12. The recombinant vector according to any one of Claims 1 to 10 wherein 
the said nucleotide sequence encoding a first protein is a sequence 
encoding E. coli cytochrome bo3 or E. coli fumarate reductase, or 
variants thereof. 
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13. The recombinant vector according to Claim 12 wherein the said 
nucleotide sequence encoding a first protein is a sequence encoding E. 
coli cytochrome bo3 or a variant thereof. 

14. The recombinant vector according to Claim 13 wherein the said 
nucleotide sequence encoding E. coli cytochrome bo3 is selected from 

(a) the polypeptide coding regions of the nucleotide sequence shown as 
SEQIDNO:13; 

(b) nucleotide sequences capable of hybridizing under stringent 
hybridization conditions, to a nucleotide sequence complementary with 
the polypeptide coding regions of the nucleotide sequence as defined in 
(a), and 

(c) nucleic acid sequences which are degenerate as a result of the 
genetic code to a nucleotide sequence as defined in (a) or (b). 

15. The recombinant vector according to Claim 13 or 14 wherein the said 
promoter sequence essentially comprises the cytochrome bo3 promoter 
sequence shown as positions 203 through 803 in SEQ NO: 1 . . 

16. The recombinant vector according to any one of Claims 1 to 15 wherein 
the said promoter is an inducible promoter. 

17. The recombinant vector according to any one of Claims 1 to 16, further 
comprising a nucleotide sequence encoding a linker amino acid 
sequence facilitating for the said first and second proteins to be 
expressed as a fusion protein. 

18. The recombinant vector according to Claim 17 wherein the said linker 
amino acid sequence is adapted to facilitate, upon expression of the said 
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first and second proteins, for the said second protein to be positioned in 
the said available space in the crystal lattice of the first protein. 

19. The recombinant vector according to Claim 17 or 18 wherein said linker 
5 amino acid sequence is a Strep-tag having an amino acid sequence 

shown as SEQ ID NO: 6. 

20. The recombinant vector according to Claim 17 or 18 wherein said linker 
amino acid sequence is a Strep-HA-tag having an amino acid sequence 

io shown as SEQ ID NO: 9. 

21. The recombinant vector according to any one of Claims 17 to 19 
wherein the said nucleotide sequence coding for a linker amino acid 
sequence is positioned at the 3 ! -end of the nucleotide sequence coding 

15 for E. coli cytochrome bo3 subunit IV. 

22. The recombinant vector according to any one of Claims 1 to 21 in 
addition comprising a nucleotide sequence encoding a polypeptide 
having essentially an amino acid sequence shown as SEQ ID NO: 14. 

23. The recombinant vector according, to any one of Claims 1 to 22, in 
addition comprising a nucleotide sequence encoding an affinity tag. 

24. The recombinant vector according to Claim 23, wherein the said affinity 
25 tag is aHis-tag. 

25. The recombinant vector according to Claim 23 or 24, wherein the said 
first protein is E. coli cytochrome bo3 and wherein a nucleotide 
sequence encoding an affinity tag is attached to the nucleotide sequence 

30 encoding E. coli cytochrome bo3 subunit II. 
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26. The recombinant vector according to any one of Claims 1 to 25, further 
comprising a nucleotide sequence encoding the said second protein. 

5 27. The recombinant vector according to Claim 26, wherein the said second 
protein has a molecular mass below 100 kDa. 

28. The recombinant vector according to Claim 26 or 27 wherein the second 
protein has a lower molecular weight than the first protein. 

10 

29. The recombinant vector according to any one of Claims 22 to 28 
wherein the said second protein is a membrane protein. 

30. A cultured host cell harbouring a recombinant vector as defined in any 
15 one of Claims 26 to 29. 

31. The host cell according to Claim 30 which is an E. coli cell. 

32. A process for the production of a fusion protein which comprises 
20 culturing a host cell as defined in Claim 30 or 31 under conditions 

whereby the said fusion protein is produced, and recovering the said 
fusion protein. 

33. A fusion protein obtained or obtainable by the process as defined in 
25 Claim 32. 

34. A fusion protein comprising (i) a first protein which is a membrane 
protein or multisubunit protein and which upon crystallization yields 
crystals having available space in the lattice, so as to allow for the 

30 ordered packing of a second protein into the said available space; and 
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(ii) a second protein to be accommodated, upon crystallization, in the 
said available space wherein the resulting crystal is capable of 
diffracting x-rays. 

5 3 5 . A fusion protein comprising (i) a first protein which a first protein which 
is a membrane protein or multisubunit protein and which, when 
crystallized with a second protein, is capable of accommodating the 
second protein in the crystal lattice and (ii) a second protein to be 
located, when crystallized, in the crystal lattice of the first protein 

10 wherein the resulting crystal lattice is capable of diffracting x-rays. 

36. A fusion protein according to Claim 34 or 35 wherein either or both of 
the first and second proteins are integral membrane proteins. 

15 37. The fusion protein according to Claim 32 to 36 wherein the said first 
protein is E. coli cytochrome bo3. 

38. The fusion protein according to Claim 37 wherein the said second 
protein is attached to subunit IV of E. coli cytochrome bo3. 

20 

3 9 . A method for crystallization of a protein, comprising 

© obtaining a fusion protein comprising (I) a first protein, which is a 
membrane protein or multisubunit protein and which upon 
crystallization yields crystals having available space in the 
25 lattice, so as to facilitate crystallization of a second protein; and 

(IT) the said (second) protein to be crystallized; and 
(ii) crystallizing the said fusion protein 
wherein the resulting crystal is capable of diffracting x-rays. 



30 40. A method for crystallization of a protein, comprising 
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(i) obtaining according to the process as defined in Claim 32, a 

fusion protein; and 

(ii) crystallizing the said fusion protein 

wherein the resulting crystal is capable of diffracting x-rays. 

41. A method for crystallization of a protein, comprising 

(i) obtaining a fusion protein as defined in any one of Claims 33 to 

38; and 

(ii) crystallizing the said fusion protein. 

42. A method according to any one of Claims 39 to 41 wherein the first 
protein is an integral membrane protein. 



43. A method for crystallization of a protein, comprising 

15 (i) obtaining a first protein which is an integral membrane protein and 

which upon crystallization yields crystals having available 
space in the lattice so as to facilitate crystallization of a second 
protein; and 

(ii) obtaining the second protein to be crystallized; and 
20 (iii) crystallizing both the said proteins together 

wherein the resulting crystal is capable of diffracting x-rays. 

44. A method according to Claim 43 wherein the second protein is soaked 
into a crystal of the first protein. 



45. A method according to any one of Claims 39 to 44 wherein the first 
protein is as defined in any one of Claims 1 to 24. 



30 



46. A method according to any one of Claims 39 to 45 further comprising a 
step wherein at least two detergents are screened in the crystal growth 
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conditions to identify which one optimizes the growth and/or diffraction 
of the resulting crystals. 

47. A method according to any one of Claims 39 to 46 further comprising a 
5 step wherein the pH is optimized for crystal growth. 

48 A method according to any one of Claims 39 to 47 wherein the crystal 
space group of the first protein when crystallized alone may be different 
to that obtained by crystallization with the second protein. 

10 

49. A method according to any one of Claims 39 to 48 wherein the second 
protein is an integral membrane protein. 

50. A method according to any one of Claims 39 to 49 wherein the second 
15 protein has a lower molecular weight than the first protein. 

51. A method of obtaining structural data on a protein of interest comprising 

the steps of 

(i) obtaining the protein of interest; 
20 (ii) crystallising said protein in the crystal lattice of another 

protein, which crystal lattice is able to accommodate the 
protein of interest; and 
(iii) obtaining x-ray diffraction data from the crystal produced in 
step (ii). 

25 

52. A method according to Claim 51 wherein the crystallisation method is 
according to any one of Claims 39 to 49. 
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53. A method according to Claim 51 or 52 wherein the protein of interest is 
obtained by expressing a recombinant vector according to any one of 
Claims 26 to 29 or by culturing a cell according to Claim 30. 

5 54. A method according to any one of Claims 51 to 53 wherein the protein 
of interest is an integral membrane protein. 

55. A method according to any one of Claims 51 to 53 wherein the x-ray 
diffraction data is obtained to a resolution, of at least 6A. 

10 

56. Use of a recombinant vector according to any one of Claims 27 to 29 or 
a cell according to Claim 30 in a method according to any one of Claims 
51 to 55. 

15 57. A process for the production of a recombinant vector according to Claim 
1 comprising 

(i) obtaining a recombinant vector comprising (I) a nucleotide 
sequence encoding a first protein which is a membrane protein or 
multisubunit protein and which, when crystallized with a second 

20 protein, is capable of accommodating the second protein in the crystal 

lattice and (II) a promoter operably linked to the said nucleotide 
sequence; and 

(ii) introducing, into the said vector, nucleotide sequences facilitating 
the insertion of further nucleotide sequences 

25 wherein the resulting crystal would be capable of diffracting x-rays. 

58. A process for the production of a recombinant vector according to Claim 
3, comprising 

(i) obtaining a recombinant vector comprising (I) a nucleotide 
30 sequence encoding a first protein which is a membrane protein or 
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multisubunit protein and which upon crystallization yields crystals 
having available space in the lattice, so as to allow for the ordered 
packing of a second protein into the said available space, and (II) a 
promoter operably linked to the said nucleotide sequence; and 
(ii) introducing, into the said vector, nucleotide sequences facilitating 
the insertion of further nucleotide sequences 
wherein the resulting crystal would be capable of diffracting x-rays. 



10 



59. The process according to Claim 57 or 58 wherein the said recombinant 
vector obtained in step (i) comprises the nucleotide sequence shown as 
SEQ ID NO: 1 
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SEQUENCE LISTING 

<110> Pharmacia & Upjohn AB 

<120> Fusion vectors 

<130> 00126 

<140> 
<141> 

<160> 14 

<170> Patent In Ver. 2.1. 

<210> 1 
<211> 10111 
<212> DNA 

<213> Artificial Sequence 
<220> 

<221> promoter 
<222> (203) . . (803) 

<220> 

<221> -35_signal 
<222> (726) . . (731) 

<220> 

<221> -10_signal 
<222> (749) . . (755) 

<220> 

<221> misc_feature 
<222> (793) . . (797) 
<223> Shine-Dalgarno sequence 

<220> 

<221> misc_feature 
<222> (804) . . (1778) 

<223> ORF coding for cyoA (bo3 subunit II) including His-tag 
<220> 

<221> misc_feature 
<222> (1746) . . (1772) 
<223> His-tag 

<220> 

<221> misc__f eature 
<222> (1800) . . (3791) 

<223> ORF coding for cyoB (bo3 subunit I) 
<220> 

<221> misc_feature 
<222> (3781) . . (4395) 

<223> ORF coding for cyoC (bo3 subunit III) 
<220> 

<221> misc_feature 
<222> (4395) . . (4724) 
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<223> ORF coding for cyoD (bo3 subunit IV) 
<220> 

<221> misc_f eature 

<222> (4736) . . (5626) 

<223> ORF coding for cyoE 

<300> 

<301> Rumbley, J.N. 

<303> Biochim. Biophys. Acta 

<304> 1340 

<306> 131-142 

<307> 1997 

<313> 1 TO 10111 



<300> 

<301> Mingawa, J. 

<303> J. Biol. Chem. 

<304> 265 

<305> 19 

<306> 11198-11203 

<307> 1990 

<313> 203 TO 303 



<400> 1 . 
catcggtcga 
gctgctggtt 
taccgcatat 
gttggccgtg 
cctgcctgct 
tttgctggat 
gctgtagtaa 
tatttgttct 
ttttcttaca 
tgataattat 
gactccttca 
cctgatatgt 
tcatgtttac 
tcccgtggaa 
tgtcattatt 
aaggacagat 
ttgtcgttat 
ataaagatgc 
ggacggtacc 
ctcttgagcc 
ccatggactg 
tcgctttccc 
ccttcttcat 
tgcatctgat 
cgggcttctc 
agtgggtcgc 
aactggccgc 
tgtttgccga 
cagaaggtga 
ccgcccatca 
. tgttcggaaa 
tcgctggcat 
ggacct acct 
atatcatcgt 
gccagcaggc 
agatct ttac 



tcgacattct 

tgccgccaga 

ccggcaggtt 

tggttactgc 

ctgctggaag 

tatctggcgc 

tcactcgccg 

tttgtgcggc 

cgattcagct 

ttgttgaata 

gcactcccct 

tttgccaaca 

agtaatttaa 

ttgaggtcgt 

tgcaggcact 

tggtctggag 

tcccgcaat c 

taagtacagc 

tatcttaatc 

tagcaagccg 

gaaatggttc 

ggcgaacact 

tccgcgtctg 

cgccaacgaa 

aggcatgaag 

aaaagcgaag 

gcctagcgaa 

tgtaattaac 

gcacagcgca 

ccatcaccat 

attatcactt 

tattttggga 

gtggaaagag 

ggcgattgtg 

tcttgcctcg 

cgcgcacggc 



atctattctc 

cgcttgaata 

atgcctttgc 

tattgacgat 

tcggggtttt 

tacgaaaaac 

taattattac 

ttagcgtttg 

aatgagtctt 

attgttttat 

tttgttataa 

tatgtgacct 

ccttcccgta 

taaatgagac 

gtattgctca 

caacgttcac 

ttgatggctg 

ccgaactggt 

atcatcttcc 

ctggcacacg 

ttcatctacc 

ccggtgtact 

ggtagccaga 

cccggcactt 

ttcaaagcta 

cagtcgccga 

tacaaccagg 

aagtttatgg 

cacgaaggta 

caccatcacc 

gatgcagtcc 

ggtctggcgc 

tggctgacct 

atgttgctgc 

gcgggcgaag 

gtgattatga 



cgtcgccgct 

tacacgagta 

catgtggaca 

ggacgcgctg 

agtcgccctt 

gcatctgacg 

ggctaaataa 

gaaattattg 

tattttctca 

ttcacattgg 

cgcccttttg 

ggcagccaaa 

aaatgcccac 

tcaggaaata 

gtggctgtaa 

tgatactgac 

ttggtttcgc 

cacactccaa 

ttgcagtact 

acgagaagcc 

cggaacaggg 

tcaaagtgac 

tttatgccat 

atgacggtat 

ttgcaacacc 

acaccatgtc 

tggaatattt 

ctcacggtaa 

tggaaggcat 

atcgataaag 

cgttccatga 

tcgttggcct 

ccgtcgacca 

gtggttttgc 

cgggcttcct 

tcttcttcgt 



gccgtaccag 

aatgacaact 

ctggcggcgg 

gatttgacgc 

tctggcgtcg 

taatctgtaa 

acatcagtat 

gcgccattta 

tcacccagtt 

ttataccaat 

caacagcttc 

tccaagtaac 

acactttaaa 

caataaaagt 

ttctgcgctg 

ggcatttggc 

ctggaagtac 

taaagtggaa 

gacctggaaa 

cattaccatc 

cattgctacc 

ctccaactcc 

ggccggtatg 

ctccgccagc 

ggatcgcgcc 

tgacatggct 

ctccaacgtg 

gagcatggac 

ggacatgagc 

gggttgagga 

acctatcgtc 

gatcacttac 

taaacgcctc 

tgacgccatt 

gccacctcac 

agcgatgcct 



ggcttatttt 

ttatctcccg 

gcgtcagcct 

acttctcttt 

tgcttggtgg 

atattattta 

cgcttattaa 

taaattctat 

gtcactctaa 

tgcccgccca 

ttaaaatcaa 

aggaatttaa 

cgccaccaga 

ttgggatggt 

ttagatccca 

ctgatgttga 

cgtgcgagca 

gctgtggtct 

accactcacg 

gaagtggttt 

gtgaatgaaa 

gtgatgaact 

cagactcgcc 

tacagcggcc 

gcattcgacc 

gcgttcgaaa 

aaaccagact 

atgacccagc 

cacgcggaat 

agaataaaga 

atggttacga 

ttcggtaagt 

ggtatcatgt 

atgatgcgta 

cactacgatc 

ttcgttatcg 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 
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gtctgatgaa cctggtggtt ccgctgcaga tcggcgcgcg tgacgttgcg ttcccgttcc 2220 
tcaacaactt aagcttctgg tttaccgttg ttggtgtgat tctggttaac gtttctctcg 2280 
gcgtgggcga atttgcgcag accggctggc tggcctatcc accgctatcg ggaatagagt 2340 
acagtccggg agtcggtgtc gattactgga tatggagtct ccagctatcc ggtataggta 2400 
cgacgcttac cggtatcaac ttcttcgtta ccattctgaa gatgcgcgca ccgggcatga 2460 
ccatgttcaa gatgccagta tttacctggg catcactgtg cgcgaacgta ctgattattg 2520 
cttccttccc aattctgacg gttaccgtcg cgttgttgac " cctggatcgc tatctgggca 2580 
cccatttctt taccaacgat atgggtggca acatgatgat gtacatcaac ctgatttggg 2640 
cctggggcca cccggaagtt tacatcctga tcctgcctgt tttcggtgtg ttctccgaaa 2700 
ttgcggcaac cttctcgcgt aaacgtctgt ttggttatac ctcgctggta tgggcaaccg 2760 
tctgtatcac cgtgctgtcg ttcatcgttt ggctgcacca cttctttacg atgggtgcgg 2820 
gcgcgaacgt aaacgccttc tttggtatca ccacaatgat tatcgccatc ccgaccgggg 2880 
tgaagatctt caactggctg ttcaccatgt atcagggccg catcgtgttc cattctgcga 2940 
tgctgtggac catcggtttt atcgtcacct tctcggtggg cgggatgact ggcgtgctgc 3000 
tggccgtacc gggcgcggac ttcgttctgc ataacagcct gttcctgatt gcgcacttcc 3060 
ataacgtgat catcggcggc gtggtcttcg gctgcttcgc agggatgacc tactggtggc 3120 
ctaaagcgtt ■ cggtttcaaa ctgaacgaaa cctggggtaa acgcgcgttc .tggttctgga 3180 
tcatcggctt cttcgttgcc tttatgccac tgtatgcgct gggcttcatg ggcatgaccc 3240 
gtcgtttgag ccagcagatt gacccgcagt tccacaccat gctgatgatt gcagccagcg 3300 
gtgcagtact gattgcgctg ggtattctct gcctcgttat tcagatgtac gtttctattc 3360 
gcgaccgcga ccagaaccgt gacctgactg gcgacccgtg gggtggccgt acgctggagt 3420 
gggcaacctc ttccccgcct ccgttctata actttgccgt agtgccgcac gttcacgaac 3480 
gtgatgcatt ctgggaaatg aaagagaaag gcgaagcgta taaaaagcct gaccactatg 3540 
aagaaattca tatgccgaaa aacagcggtg caggtatcgt cattgcagct ttctccacca 3600 
tcttcggttt cgccatgatc tggcatatct ggtggctggc gattgttggc ttcgcaggca 3660 
tgatcatcac ctggatcgtg aagagcttcg acgaggacgt ggattactac gtgccggtgg 3720 
cagaaatcga aaaactggaa aaccagcatt tcgatgagat tactaaggca gggctgaaaa 3780 
atggcaactg atactttgac gcacgcgact gcccacgcgc acgaacacgg gcaccacgat 3840 
gcaggcggaa ccaaaatctt cggattttgg atctacctga tgagcgactg cattctgttc 3900 
tctatcttgt ttgctaccta tgccgttctg gtgaacggca ccgcaggcgg cccgacaggt 3960 
aaggacattt tcgaactgcc gttcgttctg gttgaaactt tcttgctgtt gttcagctcc 4020 
atcacctacg gcatggcggc tatcgccatg tacaaaaaca acaaaagcca ggttatctcc 4080 
tggctggcgt tgacctggtt gtttggtgcc ggatttatcg ggatggaaat ctatgaattc 4140 
catcacctga ttgttaacgg catgggtccg gatcgcagcg gcttcctgtc agcgttcttt 4200 
gcgctggtcg gcacgcacgg tctgcacgtc acttctggtc ttatctggat ggcggtgctg 4260 
atggtgcaaa tcgcccgtcg cggcctgacc agcactaacc gtacccgcat catgtgcctg 4320 
agcctgttct ggcacttcct ggatgtggtt tggatctgtg tgttcactgt tgtttatctg 4380 
atgggggcga tgtaatgagt cattctaccg atcacagcgg cg'cgtcccat ggcagcgtaa 4 440 
aaacctacat gacaggcttt atcctgtcga tcattctgac ggtgattccg ttctggatgg 4500 
tgatgacagg agctgcctct ccggccgtaa ttctgggaac aatcctggca atggcagtgg 4 560 
tacaggttct ggtgcatctg gtgtgcttcc tgcacatgaa taccaaatca gatgaaggct 4 620 
ggaacatgac ggcgtttgtc ttcaccgtgc taatcatcgc tatcctggtt gtaggctcca 4 680 
tctggattat gtggaacctc aactacaaca tgatgatgca ctaagagcgg cggttatgat 4740 
gtttaagcaa tacctgcaag taacgaaacc aggcatcatc tttggcaacc tgatctcggt 4800 
gattggggga ttcctgctgg cctcaaaggg cagcattgat tatcccctgt ttatctacac 4860 
gctggttggg gtgtcactgg ttgtggcgtc gggttgtgtg tttaacaact acatcgacag 4 920 
ggatatcgac agaaagatgg aaaggacgaa gaatcgggtg ctggtgaaag gcctgatctc 4980 
tcctgctgtc tcgctggtgt acgccacgtt gctgggtatt gctggcttta tgctgctgtg 5040 
gtttggcgcg aatccgctgg cctgctggct gggggtgatg ggctttgtgg tttatgtcgg 5100 
cgtttatagc ctgtacatga aacgccactc tgtctacggc acgttgattg gttcgctctc 5160 
cggcgctgcg ccgccggtga tcggctactg tgcggtaacc ggtgagttcg atagcggcgc 5220 
agcgatcctg ctggctatct tcagcctgtg gcagatgcct cactcctatg ccatcgccat 5280 
tttccgcttt aaggattacc aggcggcaaa cattccggta ttgccagtgg taaaaggcat 5340 
ttcggtggcg aagaatcaca tcacgctgta tatcatcgcc tttgccgttg ccacgctgat 5400 
gctctctctt ggcggttacg ctgggtataa atatctggtg gtcgccgcgg cggttagcgt 5460 
ctggtggtta ggtatggctc tgcgcggtta taaagttgct gatgacagaa tctgggcgcg 5520 
caagctgttc ggcttctcta tcatcgccat cactgccctc tcggtgatga tgtccgttga 5580 
ttttatggta ccggactcgc atacgctgct ggctgctgtg tggtaacaaa acctctctat 5640 
taaaaaggtg ctacggcacc ttttttctta gcattagaaa catatccctc tcgaaatatt 5700 
. tactaaaaaa tccgcatgtt taccccattc gtttgccgct ttacactagt cgcgaattta 57 60 
aaacagaggt ggtaatgaac gattataaaa tgacgccagg tgagaggcgc gcgacctggg 5820 
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gtttagggac 
acggggcctg 
cgatcttccc 
gtgatgccgg 
atcgcgtagt 
cggtcggaca 
agcagcacgc 
ggcagtaccg 
acgatgagcg 
gataaactac 
acgaaagggc 
ttagacgtca 
ctaaatacat 
. atattgaaaa 
tgcggcattt 
tgaagatcag 
ccttgagagt 
atgtggcgcg 
ctattctcag 
catgacagta 
cttacttctg 
ggatcatgta 
cgagcgtgac 
cgaactactt 
tgcaggacca 
agccggtgag 
ccgtatcgta 
gatcgctgag 
atatatactt 
cctttttgat 
agaccccgta 
ctgcttgcaa 
accaactctt 
tctagtgtag 
cgctctgcta 
gttggactca 
gtgcacacag 
gctatgagaa 
cagggtcgga 
tagtcctgtc 
ggggcggagc 
ctggcctttt 
taccgccttt 
agtgagcgag 
tatttcacac 
. ccagtataca 
acacccgctg 
gtgaccgtct 
aggcagctgc 
tcatccgcgt 
cgggccatgt 
tttctgttca 
actgatgatg 
atgcggcggg 
gtaggtgttc 
cagggcgctg 
. gttgttgctc 
ggtgattcat 
aggagcacga 
cggctgctgg 
acagttctcc 



cgtattctcg 
ccaccatacc 
catcggtgat 
ccacgatgcg 
cgatagtggc 
gtgctccgag 
catagtgact 
gcataaccaa 
cattgttaga 
cgcattaaag 
ctcgtgatac 
ggtggcactt 
tcaaatatgt 
aggaagagta 
tgccttcctg 
ttgggtgcac 
tttcgccccg 
gtattatccc 
aatgacttgg 
agagaattat 
acaacgatcg 
actcgccttg 
accacgatgc 
actctagctt 
cttctgcgct 
cgtgggtctc 
gttatctaca 
ataggtgcct 
tagattgatt 
aatctcatga 
gaaaagatca 
acaaaaaaac 
tttccgaagg 
ccgtagttag 
atcctgttac 
agacgatagt 
cc'cagcttgg 
agcgccacgc 
acaggagagc 
gggtttcgcc 
ctatggaaaa 
gctcacatgt 
gagtgagctg 
gaagcggaag 
cgcatatggt 
ctccgctatc 
acgcgccctg 
ccgggagctg 
ggtaaagctc 
ccagctcgtt 
taagggcggt 
tgggggtaat 
aacatgcccg 
accagagaaa 
cacagggtag 
acttccgcgt 
aggtcgcaga 
tctgctaacc 
tcatgcgcac 
agatggcgga 
gcaagaattg 



ttgcgcatgc 
cacgccgaaa 
gtcggcgata 
tccggcgtag 
tccaagtagc 
aacgggtgcg 
ggcgatgctg 
gcctatgcct 
tttcatacac 
ctattcgatg 
gcctattttt 
ttcggggaaa 
atccgctcat 
tgagtattca 
tttttgctca 
gagtgggtta 
aagaacgttt 
gtgttgacgc 
ttgagtactc 
gcagtgctgc 
gaggaccgaa 
atcgttggga 
ctgcagcaat 
cccggcaaca 
cggcccttcc 
gcggtatcat 
cgacggggag 
cactgattaa 
taaaacttca 
ccaaaatccc 
aaggatcttc 
caccgctacc 
taactggctt 
gccaccactt 
cagtggctgc 
taccggataa 
agcgaacgac 
ttcccgaagg 
gcacgaggga 
acctctgact 
acgccagcaa 
tctttcctgc 
ataccgctcg 
agcgcctgat 
gcactctcag 
gctacgtgac 
acgggcttgt 
catgtgtcag 
atcagcgtgg 
gagtttctcc 
tttttcctgt 
gataccgatg 
gttactggaa 
aatcactcag 
ccagcagcat 
ttccagactt 
cgttttgcag 
agtaaggcaa 
ccgtggccag 
cgcgatggat 
attggctcca 



aaggagatgg 
caagcgctca 
taggcgccag 
aggatccaca 
gaagcgagca 
catagaaatt 
tcggaatgga 
acagcatcca 
ggtgcctgac 
ataagctgtc 
ataggttaat 
tgtgcgcgga 
gagacaataa 
acatttccgt 
cccagaaacg 
catcgaactg 
tccaatgatg 
cgggcaagag 
accagtcaca 
cataaccatg 
ggagctaacc 
accggagctg 
ggcaacaacg 
attaatagac 
ggctggctgg 
tgcagcactg 
tcaggcaact 
gcattggtaa 
tttttaattt 
ttaacgtgag 
ttgagatcct 
agcggtggtt 
cagcagagcg 
caagaactct 
tgccagtggc 
ggcgcagcgg 
ctacaccgaa 
gagaaaggcg 
gcttccaggg 
tgagcgtcga 
cgcggccttt 
gttatcccct 
ccgcagccga 
gcggtatttt 
tacaatctgc 
tgggtcatgg 
ctgctcccgg 
aggttttcac 
tcgtgaagcg 
agaagcgtta 
ttggtcactg 
aaacgagaga 
cgttgtgagg 
ggtcaatgcc 
cctgcgatgc 
tacgaaacac 
cagcagtcgc 
ccccgccagc 
gacccaacgc 
atgttctgcc 
attcttggag 



cgcccaacag 
tgagcccgaa 
caaccgcacc 
ggacgggtgt 
ggactgggcg 
gcatcaacgc 
cgatatcccg 
gggtgacggt 
tgcgttagca 
aaacatgagc 
gtcatgataa 
acccctattt 
ccctgataaa 
gtcgccctta 
ctggtgaaag 
gatctcaaca 
agcactttta 
caactcggtc 
gaaaagcatc 
agtgataaca 
gcttttttgc 
aatgaagcca 
ttgcgcaaac 
tggatggagg 
tttattgctg 
gggccagatg 
atggatgaac 
ctgtcagacc 
aaaaggatct 
ttttcgttcc 
ttttttctgc 
tgtttgccgg 
cagataccaa 
gtagcaccgc 
gataagtcgt 
tcgggctgaa 
ctgagatacc 
gacaggtatc 
ggaaacgcct 
tttttgtgat 
ttacggttcc 
gattctgtgg 
acgaccgagc 
ctccttacgc 
tctgatgccg 
ctgcgccccg 
catccgctta 
cgtcatcacc 
attcacagat 
atgtctggct 
atgcctccgt 
ggatgctcac 
gtaaacaact 
agcgcttcgt 
agatccggaa 
ggaaaccgaa 
ttcacgttcg 
ctagccgggt 
tgcccgagat 
aagggttggt 
tggtgaatcc 



tcccccggcc 

gtggcgagcc 

tgtggcgccg 

ggtcgccatg 

gcggccaaag 

atatagcgct 

caagaggccc 

gccgaggatg 

atttaactgt 

attcttgaag 

taatggtttc 

gtttattttt 

tgcttcaata 

ttcccttttt 

taaaagatgc 

gcggtaagat 

aagttctgct 

gccgcataca 

ttacggatgg 

ctgcggccaa 

acaacatggg 

taccaaacga 

tattaactgg 

cggataaagt 

ataaatctgg 

gtaagccctc 

gaaatagaca 

aagtttactc 

aggtgaagat 

actgagcgtc 

gcgtaatctg 

atcaagagct 

atactgtcct 

ctacatacct 

gtcttaccgg 

cggggggttc 

tacagcgtga 

cggtaagcgg 

ggtatcttta 

gctcgtcagg 

tggccttttg 

ataaccgtat 

gcagcgagtc 

atctgtgcgg 

catagttaag 

acacccgcca 

cagacaagct 

gaaacgcgcg 

gtctgcctgt 

tctgataaag 

gtaaggggga 

gatacgggtt 

ggcggtatgg 

taatacagat 

cataatggtg 

gaccattcat 

ctcgcgtatc 

cctcaacgac 

gcgccgcgtg 

ttgcgcattc 

gttagcgagg 



5880 

5940 

6000 

6060 

6120 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 

7 4 40 

7500 

7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 

8880 

8940 

9000 

9060 

9120 

9180 

9240 

9300 

9360 

9420 

9480 
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tgccgccggc 
ggaggcagac 
ccgaggcggc 
gagccgcgag 
tggcctgcaa 
ccatccagcc 
cggcgataat 
gagcgagggc 
agcgaaagcg 
gcatgataaa 
aggagctgac 



ttccattcag 
aaggtatagg 
ataaatcgcc 
cgatccttga 
cgcgggcatc 
tcgcgtcgcg 
ggcctgcttc 
gtgcaagatt 
gtcctcgccg 
gaagacagtc 
tgggttgaag 



gtcgaggtgg 
gcggcgccta 
gtgacgatca 
agctgtccct 
ccgatgccgc 
aacgccagca 
tcgccgaaac 
ccgaataccg 
aaaatgaccc 
ataagtgcgg. 
gctctcaagg 



cccggctcca 
caatccatgc 
gcggtccagt 
gatggtcgtc 
cggaagcgag 
agacgtagcc 
gtttggtggc 
caagcgacag 
agagcgctgc 
cgacgatagt 

g 



tgcaccgcga 
caacccgttc 
gatcgaagtt 
atctacctgc 
aagaatcata 
cagcgcgtcg 
gggaccagtg 
gccgatcatc 
cggcacctgt 
catgccccgc 



cgcaacgcgg 
catgtgctcg 
aggctggtaa 
ctggacagca 
atggggaagg 
gccgccatgc 
acgaaggctt 
gtcgcgctcc 
cctacgagtt 
gcccaccgga 



9540 

9600 

9660 

9720 

9780 

9840 

9900 

9960 

10020 

10080 

10111 



<210> 2 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 2 

ttcacgaacg tgatgcattc tgggaaatga 



30 



<210> 3 
<211> 30 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 3 

ctccttgcat gcgcaagcag aatacggtcc 30 



<210> 4 
<211> 52 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 4 

gcacagcggc cgcgagctca cgcgtatgtc tagataagag cggcggttat ga 52 



<210> 5 
<211> 49 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 5 

atctagacat acgcgtgagc tcgcggccgc tgtgcatcat catgttgta 4 9 
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<210> 6 
<211> 9 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Strep-tag 
linker 

<400> 6 

Ala Trp Arg His Pro Gin Phe Gly Gly 
1 5 



<210> 7 
<211> 34 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Strep-tag 
linker, forward strand 

<400> 7 

ggccgcgcct ggcgtcatcc . tcaattcggt ggca 



<210> 8 
<211> 34 
<212> DNA 

<213> Artificial Sequence 
<220> . 

<223> Description of Artificial Sequence: Strep-tag 
linker, reverse strand 

<400> 8 

cgcgtgccaa attgaggatg acgccgccag gcgc 



<210> 9 
<211> 18 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Strep-HA-tag 



<400> 9 

Ala Trp Arg His Pro Gin Phe Gly Gly Tyr Pro Tyr Asp Val Pro Asp 
1 5 10 15 

Tyr Ala 



<210> 10 
<211> 9 
<212> PRT 

<213> Artificial Sequence 



linker 
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<220> 

<223> Description of Artificial Sequence: HA-tag linker 



<400> 10 

Tyr Pro Tyr Asp Val Pro Asp Tyr Ala 
1 5 



<210> 11 
<211> 61 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Strep-HA-tag 
linker, forward strand 

<400> 11 

ggccgcgcct ggcgtcatcc tcaattcggt ggctacccat acgacgtccc agactacgct 60 



<210> 12 
<211> 61 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Strep-HA-tag 
linker, reverse strand 

<400> 12 

cgcgtagcgt agtctgggac gtcgtatggg tagccaccga attgaggatg acgccaggcg 60 



<210> 13 
<211> 5819 
<212> DNA 
<213> E. coli 

<220>. 

<221> misc^feature 
<222> (801) . . (1748) 

<223> ORF coding for cyoA (bo3 subunit II) 
<220> 

<221> misc_feature 
<222> (1770) . . (3761) 

<223> ORF coding for cyoB (bo3 subunit I) 
<220> 

<221> misc_feature 
<222> (3751) . . (4365) 

<223> ORF coding for cyoC (bo3 subunit III) 
<220> 

<221> misc_feature 
<222> (4365) . . (4694) 

<223> ORF coding for cyoD (bo3 subunit IV) 
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<220> 

<221> misc_f eature 
<222> (4706) (5596) 
<223> ORF coding for cyoE 

<300> 

<301> Chepuri, V. 
<303> J. Biol. Chem. 
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<400> 13 

actactgtcg acattctatc tattctccgt 
gctggtttgc cgccagacgc ttgaatatac 
cgcatatccg gcaggttatg cctttgccat 
ggccgtgtgg ttactgctat tgacgatgga 
gcctgctctg ctggaagtcg gggttttagt 
gctggattat ctggcgctac gaaaaacgca 
gtagtaatca ctcgccgtaa ttattacggc 
ttgttctttt gtgcggctta gcgtttggaa 
tcttacacga ttcagctaat gagtctttat 
taattatttg ttgaataatt gttttatttc 
tccttcagca ctcccctttt gttataacgc 
gatatgtttt gccaacatat gtgacctggc 
tgtttacagt aatttaacct tcccgtaaaa 
cgtggaattg aggtcgttaa atgagactca 
..cattatttgc aggcactgta ttgctcagtg 
gacagattgg tctggagcaa cgttcactga 
tcgttattcc cgcaatcttg atggctgttg 
aagatgctaa gtacagcccg aactggtcac 
cggtacctat cttaatcatc atcttccttg 
ttgagcctag caagccgctg gcacacgacg 
tggactggaa atggttcttc atctacccgg 
ctttcccggc gaacactccg gtgtacttca 
tcttcattcc gcgtctgggt agccagattt 
atctgatcgc caacgaaccc ggcacttatg 
gcttctcagg catgaagttc aaagctattg 
gggtcgcaaa agcgaagcag tcgccgaaca 
tggccgcgcc tagcgaatac aaccaggtgg 
ttgccgatgt aattaacaag tttatggctc 
aaggtgagca cagcgcacac gaaggtatgg 
cccattaaag gggttgagga agaataaaga 
cgttccatga acctatcgtc atggttacga 
tcgttggcct gatcacttac ttcggtaagt 
ccgtcgacca taaacgcctc ggtatcatgt 
gtggttttgc tgacgccatt atgatgcgta 
cgggcttcct gccacctcac cactacgatc 
tcttcttcgt agcgatgcct ttcgttatcg 
tcggcgcgcg tgacgttgcg ttcccgttcc 
ttggtgtgat tctggttaac gtttctctcg 
tggcctatcc accgctatcg ggaatagagt 
tatggagtct ccagctatcc ggtataggta 
ccattctgaa gatgcgcgca ccgggcatga 
catcactgtg cgcgaacgta ctgattattg 
cgttgttgac cctggatcgc tatctgggca 
acatgatgat gtacatcaac ctgatttggg 



cgccgctgcc gtaccagggc ttattttgct 60 
acgagtaaat gacaacttta tctcccgtac 120 
gtggacactg gcggcgggcg tcagcctgtt 180 
cgcgctggat ttgacgcact tctctttcct 24 0 
cgccctttct ggcgtcgtgc ttggtggttt 300 
tctgacgtaa tctgtaaata ttatttagct 360 
taaataaaca tcagtatcgc ttattaatat 420 
attattggcg ccatttataa attctatttt 480 
tttctcatca cccagttgtc actctaatga 540 
acattggtta taccaattgc ccgcccagac 600 
ccttttgcaa cagcttctta aaatcaacct 660 
agccaaatcc aagtaacagg aatttaatca 720 
tgcccacaca ctttaaacgc caccagatcc 780 
ggaaatacaa taaaagtttg ggatggttgt 840 
gctgtaattc tgcgctgtta gatcccaaag 900 
tactgacggc atttggcctg atgttgattg 960 
gtttcgcctg gaagtaccgt gcgagcaata 1020 
actccaataa agtggaagct gtggtctgga 1080 
cagtactgac ctggaaaacc actcacgctc 1140 
agaagcccat taccatcgaa gtggtttcca 1200 
aacagggcat tgctaccgtg aatgaaatcg 1260 
aagtgacctc caactccgtg atgaactcct 1320 
atgccatggc cggtatgcag actcgcctgc 1380 
acggtatctc cgccagctac agcggcccgg 1440 
caacaccgga tcgcgccgca ttcgaccagt 1500 
ccatgtctga catggctgcg ttcgaaaaac 1560 
aatatttctc caacgtgaaa ccagacttgt 1620 
acggtaagag catggacatg acccagccag 1680 
aaggcatgga catgagccac gcggaatccg 17 40 
tgttcggaaa attatcactt gatgcagtcc 1800 
tcgctggcat tattttggga ggtctggcgc 18 60 
ggacctacct gtggaaagag tggctgacct 1920 
atatcatcgt ggcgattgtg atgttgctgc 1980 
gccagcaggc tcttgcctcg gcgggcgaag 2040 
agatctttac cgcgcacggc gtgattatga 2100 
gtctgatgaa cctggtggtt ccgctgcaga 2160 
tcaacaactt aagcttctgg tttaccgttg 2220 
gcgtgggcga atttgcgcag accggctggc 2280 
acagtccggg agtcggtgtc gattactgga 2340 
cgacgcttac cggtatcaac ttcttcgtta 2400 
ccatgttcaa gatgccagta tttacctggg 24 60 
cttccttccc aattctgacg gttaccgtcg 2520 
cccatttctt taccaacgat atgggtggca 2580 
cctggggcca cccggaagtt tacatcctga 2640 



SUBSTITUTE SHEET (RULE 26) 



WO 01/85962 





T/GBO 1/02043 



9 



tcctgcctgt tttcggtgtg ttctccgaaa ttgcggcaac cttctcgcgt aaacgtctgt 2700 
ttggttatac ctcgctggta tgggcaaccg tctgtatcac cgtgctgtcg ttcatcgttt 2760 
ggctgcacca cttctttacg atgggtgcgg gcgcgaacgt aaacgccttc tttggtatca 2820 
ccacaatgat tatcgccatc ccgaccgggg tgaagatctt caactggctg ttcaccatgt 2880 
atcagggccg catcgtgttc cattctgcga tgctgtggac catcggtttt atcgtcacct 2940 
tctcggtggg cgggatgact ggcgtgctgc tggccgtacc gggcgcggac ttcgttctgc 3000 
ataacagcct gttcctgatt gcgcacttcc ataacgtgat catcggcggc gtggtcttcg 3060 
gctgcttcgc agggatgacc tactggtggc ctaaagcgtt cggtttcaaa ctgaacgaaa 3120 
cctggggtaa " acgcgcgttc tggttctgga tcatcggctt cttcgttgcc tttatgccac 3180 
tgtatgcgct gggcttcatg ggcatgaccc gtcgtttgag ccagcagatt gacccgcagt 3240 
tccacaccat gctgatgatt gcagccagcg gtgcagtact gattgcgctg ggtattctct 3300 
gcctcgttat tcagatgtac gtttctattc gcgaccgcga ccagaaccgt gacctgactg 3360 
gcgacccgtg gggtggccgt acgctggagt gggcaacctc ttccccgcct ccgttctata 3420 
actttgccgt agtgccgcac gttcacgaac gtgatgcatt ctgggaaatg aaagagaaag 3480 
gcgaagcgta taaaaagcct gaccactatg aagaaattca tatgccgaaa aacagcggtg 3540 
caggtatcgt cattgcagct ttctccacca tcttcggttt cgccatgatc tggcatatct 3600 
ggtggctggc gattgttggc ttcgcaggca tgatcatcac ctggatcgtg aaaagcttcg 3660 
acgaggacgt ggattactac gtgccggtgg cagaaatcga aaaactggaa aaccagcatt 3720 
tcgatgagat tactaaggca gggctgaaaa atggcaactg atactttgac gcacgcgact 3780 
gcccacgcgc acgaacacgg gcaccacgat gcaggcggaa ccaaaatctt cggattttgg 3840 
atctacctga tgagcgactg cattctgttc tctatcttgt ttgctaccta tgccgttctg 3900 
gtgaacggca ccgcaggcgg cccgacaggt aaggacattt tcgaactgcc gttcgttctg 3960 
gttgaaactt tcttgctgtt gttcagctcc atcacctacg gcatggcggc tatcgccatg 4020 
tacaaaaaca acaaaagcca ggttatctcc tggctggcgt tgacctggtt gtttggtgcc 4080 
ggatttatcg ggatggaaat ctatgaattc catcacctga ttgttaacgg catgggtccg 4140 
gatcgcagcg gcttcctgtc • agcgttcttt gcgctggtcg gcacgcacgg tctgcacgtc 4200 
acttctggtc ttatctggat ggcggtgctg atggtgcaaa tcgcccgtcg cg.gcctgacc 4260 
agcactaacc gtacccgcat catgtgcctg agcctgttct ggcacttcct ggatgtggtt 4320 
tggatctgtg tgttcactgt tgtttatctg atgggggcga tgtaatgagt cattctaccg 4380 
atcacagcgg cgcgtcccat ggcagcgtaa aaacctacat gacaggcttt atcctgtcga 44 40 
tcattctgac ggtgattccg ttctggatgg tgatgacagg agctgcctct ccggccgtaa 4500 
ttctgggaac aatcctggca atggcagtgg tacaggttct ggtgcatctg gtgtgcttcc 4560 
tgcacatgaa taccaaatca gatgaaggct ggaacatgac ggcgtttgtc ttcaccgtgc 4 620 
taatcatcgc tatcctggtt gtaggctcca tctggattat gtggaacctc aactacaaca 4 680 
tgatgatgca ctaagagcgg cggttatgat gtttaagcaa tacctgcaag taacgaaacc 4740 
aggcatcatc tttggcaacc tgatctcggt gattggggga ttcctgctgg cctcaaaggg 4800 
cagcattgat tatcccctgt ttatctacac gctggttggg gtgtcactgg ttgtggcgtc 4860 
gggttgtgtg tttaacaact acatcgacag ggatatcgac agaaagatgg aaaggacgaa 4 920 
gaatcgggtg ctggtgaaag gcctgatctc tcctgctgtc tcgctggtgt acgccacgtt 4 980 
gctgggtatt gctggcttta tgctgctgtg gtttggcgcg aatccgctgg cctgctggct 5040 
gggggtgatg ggctttgtgg tttatgtcgg cgtttatagc ctgtacatga aacgccactc 5100 
tgtctacggc acgttgattg gttcgctctc cggcgctgcg ccgccggtga tcggctactg 5160 
tgcggtaacc ggtgagttcg atagcggcgc agcgatcctg ctggctatct tcagcctgtg 5220 
gcagatgcct cactcctatg ccatcgccat tttccgcttt aaggattacc aggcggcaaa 5280 
cattccggta ttgccagtgg taaaaggcat ttcggtggcg aagaatcaca tcacgctgta 5340 
tatcatcgcc tttgccgttg ccacgctgat gctctctctt ggcggttacg ctgggtataa 5400 
atatctggtg gtcgccgcgg cggttagcgt ctggtggtta ggtatggctc tgcgcggtta 54 60 
taaagttgct gatgacagaa tctgggcgcg caagctgttc ggcttctcta tcatcgccat 5520 
cactgccctc tcggtgatga tgtccgttga ttttatggta ccggactcgc atacgctgct 5580 
ggctgctgtg tggtaacaaa acctctctat taaaaaggtg ctacggcacc ttttttctta 5640 
gcattagaaa catatccctc tcgaaatatt tactaaaaaa tccgcatgtt taccccattc 5700 
gtttgccgct ttacactagt cgcgaattta aaacagaggt ggtaatgaac gattataaaa 5760 
tgacgccagg tgagaggcgc gcgacctggg gtttagggac cgtattctcg ttgcgcatg 5819 



<210> 14 
<211> 58 
<212> PRT 

<213> Staphylococcus aureus 
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Val Asp Asn Lys Phe Asn Lys 

1 • 5 

Leu His Leu Pro Asn Leu Asn 
20 



Ser Leu Lys Asp Asp Pro Ser 
35 

Lys Lys Leu Asn Asp Ala Gin 
5 0 55 
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Glu Gin Gin Asn Ala Phe Tyr Glu He 

. 10 . 15 

Glu Glu Gin Arg Asn Ala Phe lie Gin 
25 30 

Gin Ser Ala Asn Leu Leu Ala Glu Ala 
40 45 

Ala Pro Lys 
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